Classification methods

Size: px

Start display at page:

Download "Classification methods"

Noreen Strickland
6 years ago
Views:

Multivariate analysis (II) Cluster analysis and Cronbach s alpha Classification methods 12 th JRC Annual

eu European Commission Joint Research Centre Econometrics and Applied Statistics Unit Composite Indicators

setting short- and long-term targets Several Canadian regions may have similar CLI scores but very different

regions that are similarly situated with respect to the individual indicators, we applied cluster analysis.

1 Multivariate analysis (II) Cluster analysis and Cronbach s alpha Classification methods 12 th JRC Annual Training on Composite Indicators & Multicriteria Decision Analysis (COIN 2014) dorota.bialowolska@jrc.ec.europa.eu European Commission Joint Research Centre Econometrics and Applied Statistics Unit Composite Indicators Research Group (JRC-COIN) Multivariate analysis (II) 1 Multivariate analysis (II) 2 Cluster analysis: setting short- and long-term targets Several Canadian regions may have similar CLI scores but very different patterns across the seventeen indicators or pillars of learning To help local authorities identify peer regions that are similarly situated with respect to the individual indicators, we applied cluster analysis. Clustering using Ward s method and then using k-mean clusters Multivariate analysis (II) 3 Multivariate analysis (II) 4

Cluster analysis: solution when aggregation cannot be performed Indicators of objective health: (1) life expectancy at birth (LE), (2) infant mortality rate (IM), (3) potential years of life lost

2 Cluster analysis: solution when aggregation cannot be performed Indicators of objective health: (1) life expectancy at birth (LE), (2) infant mortality rate (IM), (3) potential years of life lost before age 70 (PYLL70), (4) probability of not reaching age 65 (P65) Indicators of subjective health: Proportions of people (1) declaring to have good general health (GH), (2) reporting no long-standing illnesses (LSI), (3) reporting no limitations in activities because of health issues (LA) To depict health conditions in the EU regions (1) hierarchical clustering with Ward s method and squared Euclidean distance and (2) k-mean clustering Objective and subjective health measures do not always coincide The EU is clearly split into the EU-15 and Central and Eastern European countries with health conditions considerably better in the western regions of the EU This division is observed with respect to objective health conditions only. Inclusion of self-perceived health status in the analysis measures considerably changes this picture Multivariate analysis (II) 5 Multivariate analysis (II) 6 Classification methods in the beginning of the process, each element is in a cluster of its own. The clusters are then sequentially combined into larger clusters, until all elements end up being in the same cluster. K-mean cluster aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean Multivariate analysis (II) 7 Multivariate analysis (II) 8

3 1. Calculate proximity matrix (choice of measure of proximity) 1. Calculate proximity matrix (choice of measure of proximity/distance) 2. Find the smallest value in the proximity matrix (apart from values from diagonal) and join countries associated to this values (form a cluster) 3. Calculate proximity matrix for reduced set of countries. For not combined countries the values are the same as in the proximity matrix from 1). But what to do with a cluster? 4. Choose a cluster method Euclidean Squared Euclidean Minkowski Manhatan Mahalanobis Chebyshew The choice of proximity measure may influence the clustering results Steps 2 and 3 are repeated till forming one cluster with all countries included Multivariate analysis (II) 9 Multivariate analysis (II) Calculate proximity matrix (choice of measure of proximity) 2. Find the smallest value in the proximity matrix (apart from values from diagonal) and join countries associated to this values (form a cluster) Clustering methods 1. Nearest neighbor = single linkage 2. Furthest neighbor = complete linkage 3. Calculate proximity matrix for reduced set of countries. For not combined countries the values are the same as in the proximity matrix from 1). But how to establish the distance between the cluster and other countries? 4. Choose a clustering method Steps 2 and 3 are repeated till forming one cluster with all countries included 3. Average linkage 4. Median clustering 5. Ward's minimum variance method 6. etc. Multivariate analysis (II) 11 Multivariate analysis (II) 12

4 Nearest neighbor = single linkage Distance between new cluster and a country outside it is defined as the smallest distance out of distances between countries in cluster and a country outside the cluster Distance between two clusters is defined as a distance out of distances between countries (one in each cluster) that are closest to each other Furthest neighbor = complete linkage Distance between new cluster and a country outside it is defined as the furthest distance out of distances between countries in cluster and a country outside the cluster Distance between two clusters is defined as a distance out of distances between countries (one in each cluster) that are furthest to each other Multivariate analysis (II) 13 Multivariate analysis (II) 14 Average linkage Distance between new cluster and a country outside it is defined as a mean distance of distances between countries in cluster and a country outside the cluster Proximity measure: squared Euclidean distance Cluster method: furthest neighbor Distance between two clusters is defined as a mean distance of all distances between countries (one in each cluster) Multivariate analysis (II) 15 Multivariate analysis (II) 16

5 The less clustering result depends on the cluster method chosen, the better quality of the final solution It is good to normalize indicators before clustering Multivariate analysis (II) 17 Multivariate analysis (II) 18 Cronbach s alpha is regarded as a measure of both internal consistency and reliability - it is a lower bound to population reliability Cronbach s Alpha X - indicator k number of indicators Cronbach s alpha might be applied to confirm or reject uni-dimensionality Indicators are supposed to have the same orientation with regard to a composite Cronbach s alpha increases when the number of indicators increases Multivariate analysis (II) 19 Multivariate analysis (II) 20

6 Example Example Multivariate analysis (II) 21 Multivariate analysis (II) 22 Example Despite being the most frequently reported statistic supporting the quality of the test scores/composites, Cronbach s alpha is criticized Multivariate analysis (II) 23 Multivariate analysis (II) 24

7 References: Bentler, P. M. (2009). Alpha, dimension-free, and model-based internal consistency reliability. Psychometrika, 74(1), Cortina, J. (1993). What is coefficient Alpha? An examination of theory and applications. Journal of Applied Psychology, 78(1), Green, S. B., & Yang, Y. (2009). Reliability of summed item scores using structural equation modeling: an alternative to coefficient alpha. Psychometrika, 74(1), Saisana, M. (2008) Composite Learning Index: Robustness Issues and Critical Assessment. JRC Scientific and Technical Reports, EUR doi: /7087 Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of cronbach s alpha. Psychometrika, 74(1), Tavakol, M., & Dennick, R. (2011). Making sense of Cronbach s alpha. International Journal of Medical Education, 2, doi: /ijme.4dfb.8dfd Weziak-Bialowolska, D. (2014). Health conditions in regions of Eastern and Western Europe. International Journal of Public Health, 59(3), doi: /s Multivariate analysis (II) 25

Multivariate Analysis Cluster Analysis

Multivariate Analysis Cluster Analysis Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Cluster Analysis System Samples Measurements Similarities Distances Clusters