Correspondence Analysis & Related Methods

Size: px

Start display at page:

Download "Correspondence Analysis & Related Methods"

Elwin Wade
5 years ago
Views:

1 Corresponence Analysis & Relate Methos Michael Greenacre SESSION 3: MUIDIMENSIONA SCAING (MDS DIMENSION REDUCION CASSICA MDS NONMERIC MDS Distances an issimilarities... n objects = istance between object i an object j Properties of a istance (metric. = ji. 0, = 0 i = j 3. ik + kj (the triangle inequality (If 3. not satisfie we often talk of a issimilarity he chi-square istance is a true istance, whereas Bray-Curtis is a issimilarity Distances an maps... CIIES Amst. Aths. Barc. Basel Berlin Borx Amsteram Athens Barcelona Basel Berlin Boreaux : : : : : : : Multiimensional scaling (MDS Observe CIIES Amst. Aths. Barc. Basel Berlin Borx Amsteram Athens Barcelona Basel Berlin Boreaux OK Amsteram Basel Berlin? ˆ ˆ 6 Amsteram ˆ 5 Basel Berlin ˆ Boreaux Barcelona Athens Boreaux ˆ 3 Barcelona ˆ 34 etc... Athens

2 Multiimensional scaling (MDS Observe ˆ Objective is to minimize some measure of iscrepancy, or error, between observe an fitte. Minimize ( Minimize ( f ( or or also calle Sammon s non-linear mapping ; R function sammon Maximize the agreement between the rank-orere in the map an the rank-orering of the original (nonmetric MDS, similar iea to that of Spearman s rank correlation; R function isomds. for any monotonically increasing function f Classical MDS Fits the inirectly. Classical ( YoHooGo * MDS situates the in a space of as high imensionality as possible to reprouce the observe an then projects the onto low-imensional suspaces, usually a plane: centroi i ˆi Metho aims to minimize the sum of squares of these errors *YoHooGo = Young-Householer-orgerson-Gower his is equivalent to maximizing centroi i i he quality of the fit is usually measure by expresse as a %. ˆ (thanks, Pythagoras!. ˆ / i i i i R function cmscale Observe ˆ Metric an nonmetric MDS hese methos fit the interpoint irectly Stress: measures the iscrepancy between the observe (ata an the fitte (map Raw stress : ( ( Normalize stress : Kruskal stress : ˆ ( use in R function isomds for nonmetric MDS; can be thought of as a percentage error MDS of Bray-Curtis issimilarities s s s7 s7 s3 s9 s s s8 s9 s s s3s %

3 MDS of Bray-Curtis issimilarities nonmetric s s9 s7 s3 s9 s s s s7 s8 s s s0 s Stress: 3.5% MDS of chi-square s s3 s7 s s9 s8 s0 s7 s s s3 s9 s s % Corresponence s s a s7 s7 b e s3 s3 s s0 s8 s s9 s9 s s c Notice that the rows an the columns are epicte in a joint map. o be continue %

4 Corresponence Analysis & Relate Methos In this course we concentrate on the SRUCURA methos of multivariate Michael Greenacre methos that reveal continuous structures (scales, imensions, factors... methos that reveal iscrete structures (clusters, groups, segments, partitions... SESSION 4: CASSICA MDS the computations factorial methos principal components (PCA factor (FA corresponence (CA MAP multiimensional scaling (MDS scaling metric MDS non-metric MDS hierarchical clustering REE cluster non-hierarchical clustering Basic concept: istance From a map to a istance matrix (-,3 (-,- 4 (3,4 (3, (square istance matrix suppose you have n x i ( i=,...,n in p -imensional Eucliean space p imensions x x x x = x x X = M M M xn xn xn square istance between the i-th an j-th is = p k= ( x ik x jk M (square istance matrix x p x p n M xnp = M M n n M n n M nn

5 in matrix notation: n n = = s + s S M M M M n n nn if we ha S an ha to recover X it woul be simple: S = XX recall the eigenvalue-eigenvector ecomposition of a square symmetric matrix, for example of S : S = UΛU where S = XX an s = iag( S the problem in scaling: given solve for X is matrix of scalar proucts where λ UU = I ; λ Λ= λ λ λ n 0 M M M M 0 0 λn so a possible solution woul be: X = UΛ / but we on t have the scalar proucts S but rather the square = s + s S we can recover the matrix of scalar proucts S* with respect to the centroi of the n by a transformation of calle ouble-centring: subtract the row means from all the square subtract column means from the resultant matrix then multiply ouble-centre matrix by -/ to obtain S* hen carry on as before: S* = UΛU X* = UΛ / R coe to ouble-centre an eigenecompose # rea in the square istance matrix <- matrix(c(0,7,7,6,7,0,4,4,7,4,0,5,6,4,5,0,nrow=4 # compute scalar proucts n <- nrow( ones <- rep(,n I <- iag(ones S <- -0.5*(I-(/n*ones%*%t(ones %*% %*% (I-(/n*ones%*%t(ones # compute eigenvalues an eigenvectors using R function eigen S.eig <- eigen(s # compute coorinates an plot X <- S.eig$vectors[,:] %*% iag(sqrt(s.eig$values[:] plot(x, type="n" text(x, labels=:4

Rank, Trace, Determinant, Transpose an Inverse of a Matrix Let A be an n n square matrix: A = a11 a1 a1n a1 a an a n1 a n a nn nn where is the jth col

Rank, Trace, Determinant, Transpose an Inverse of a Matrix Let A be an n n square matrix: A = a11 a1 a1n a1 a an a n1 a n a nn nn where is the jth col Review of Linear Algebra { E18 Hanout Vectors an Their Inner Proucts Let X an Y be two vectors: an Their inner prouct is ene as X =[x1; ;x n ] T Y =[y1; ;y n ] T (X; Y ) = X T Y = x k y k k=1 where T an