Coespondene Analysis & Related Methods Oveview of CA and basi geometi onepts espondents, all eades of a etain newspape, osstabulated aoding to thei eduation goup and level of eading of the newspape Mihael Geenae E E 5 8 C C 46 0 0.4 0. 0. 0.09 (5.48 %) E SSION 9: (SIMPLE) CORRESPONDENCE ANALYSIS: basi geometi onepts 9 9 40 9 49 6 0. 0-0. -0. C 0.004 (.5 %) E C -0.5-0.4-0. -0. -0. 0 0. 0. 0. 0.4 0.5 0.6 E: some pimay E: pimay ompleted : some seonday : seonday ompleted : some tetiay : glane C : faily thoough C: vey thoough Pofile Row pofiles viewed in -d A pofile is a set of elative fequenies, that is a set of fequenies expessed elative to thei total (often in peentage fom). Eah ow o eah olumn of a table of fequenies defines a diffeent pofile. It is these pofiles whih CA visualises as points in a map. E E oiginal data 5 8 9 C C 46 9 40 0 9 49 6 4 8 0 6 E E ow pofiles.6.... C C.50.55..40..4.4.45.49.6 olumn pofiles E E.09....05 C C 5 9 6.05....05.0.6..9.
Plotting pofiles in pofile spae (tiangula oodinates) Weighted aveage (entoid entoid) 0.6 E : 0.6 0.50 0.4 aveage The aveage is the point at whih the two points ae balaned. weighted aveage The situation is idential fo multidimensional points... 0.4 0.50 Plotting pofiles in pofile spae (bayenti o weighted aveage piniple) 0.4 E: 0.6 0.50 0.4 Plotting pofiles in pofile spae (bayenti o weighted aveage piniple) 0.4 E: 0. 0.55 0.4 0.50 0.6 0.55 0.
Plotting pofiles in pofile spae (bayenti o weighted aveage piniple) 0.6 oiginal data C C Masses of the pofiles masses E 5 4.045 E 8 46 0.69 9 9 9 8.9 : 0. 0. 0.6 aveage ow pofile 40 49 6 0 6 5 9 6.8.4.404.4.08 0. 0. Readeship data Calulating hi-squae E E Eduation Goup Some pimay Pimay ompleted 5 (0.5) 8 (0.4) C (0.500) 46 (0.548) C (0.4) 0 (0.8) 4 Mass 0.045 0.69 χ = simila tems... + ( - 4.6) + ( -0.4) + = 6.0 (6-0.50) 4.6 0.4 0.50 Some seonday 9 (0.8) 9 (0.) 9 (0.448) 8 0.9 Eduation Goup C C Mass Seonday ompleted Some tetiay (0.9) (0.5) 5 (0.8) 40 (0.96) (0.69) 9 (0.4) 49 (0.485) 6 (0.65) 6 (0.404) 0 6 : glane C: faily thoough C: vey thoough 0.4 0.08.... Obseved Fequeny Some tetiay Expeted Fequeny (0.5) 4.6 5 (0.8) (0.69) 0.4 9 (0.4) 6 (0.65) 0.50 6 (0.404) 4 8 0 6 0.08 Fo example, expeted fequeny of (,): 0.8 x 6 = 4.6
Calulating hi-squae χ = simila tems... + 6 [ ( / 6-4.6 / 6) + ( / 6-0.4 / 6) + (6 / 6-0.50 / 6) ] 4.6 / 6 0.4 / 6 0.50 / 6 χ / = simila tems... + 0.08 [ (0.5 0.8) + (0.69 0.4) + (0.65 0.404) ] 0.8 0.4 0.404 Calulating inetia Inetia = χ / = simila tems fo fist fou ows... + 0.08 [ (0.5 0.8) + (0.69 0.4) + (0.65 0.404) ] 0.8 0.4 0.404 Eduation Goup.... Obseved Fequeny Some tetiay Expeted Fequeny (0.5) 4.6 5 (0.8) C (0.69) 0.4 9 (0.4) C 6 (0.65) 0.50 6 (0.404) 4 8 0 6 Mass 0.08 mass (of ow ) squaed hi-squae distane (between the pofile of and the aveage pofile) Inetia = mass (hi-squae distane) (0.5 0.8) + (0.69 0.4) + (0.65 0.404) EUCLIDEAN 0.8 0.4 0.404 WEIGHTED How an we see hi-squae distanes? Stethed Stethed ow pofiles viewed in -d hi-squaed spae Inetia = χ / = simila tems fo fist fou ows... + 0.08 [ (0.5 0.8) + (0.69 0.4) + (0.65 0.404) ] 0.8 0.4 0.404 mass (of ow ) squaed hi-squae distane (between the pofile of and the aveage pofile) (0.5 0.8) + (0.69 0.4) + (0.65 0.404) EUCLIDEAN 0.8 0.4 0.404 WEIGHTED ( 0.5 0.8 ) + ( 0.69 0.4 ) + ( 0.65 0.404 ) 0.8 0.8 0.4 0.4 0.404 0.404 Pythagoian odinay Eulidean distanes So the answe is to divide all pofile elements by the of thei aveages Chi-squae distanes
Summay: Basi geometi onepts Pofiles ae ows o olumns of elative fequenies, that is the ows o olumns expessed elative to thei espetive maginals, o bases. Eah pofile has a weight assigned to it, alled the mass, whih is popotional to the oiginal maginal fequeny used as a base. The aveage pofile is the the entoid (weighted aveage) of the pofiles. Vetex pofiles ae the exteme pofiles in the pofile spae ( simplex ). Pofiles ae weighted aveages of the veties, using the pofile elements as weights. The dimensionality of an I x J matix = min{i, J } The hi-squae distane measues the diffeene between pofiles, using an Eulidean-type funtion whih standadizes eah pofile element by dividing by the squae oot of its expeted value. The (total) inetia an be expessed as the weighted aveage of the squaed hi-squae distanes between the pofiles and thei aveage. The famous famous smoking data: ow poblem (see Coespondene Analysis in Patie ) atifiial example designed to illustate two-dimensional maps Senio manages Junio manages no li me hv 4 4 4 Senio employees 5 0 4 Junio employees 8 4 Seetaies 0 6 9 employees of a fim 5 ategoies of staff goup 4 ategoies of smoking (none,light,medium,heavy) ave none light medium heavy ow pofiles.6.8..8...9..49.0.4.08.0..8.5.40.4.8.08.... 0 0 0 0 0 0 0 0 0 0 0 0 View of ow pofiles in -d The famous famous smoking data: olumn poblem Senio manages Junio manages Senio employees Junio employees Seetaies no li me hv 4 4 4 5 0 4 8 4 0 6 It seems like the olumn pofiles, with 5 elements, ae 4-dimensional, BUT thee ae only 4 points and 4 points lie exatly in dimensions. So the dimensionality of the olumns is the same as the ows. no li me hv ave olumn pofiles.0.04.05.08.0.0..6.4..9.6.0.5.5.5.6...08.06.09.6.46. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
View of olumn pofiles in -d View of both pofiles and veties in -d
Coespondene Analysis & Related Methods Mihael Geenae SSION 0: (SIMPLE) CORRESPONDENCE ANALYSIS: SVD theoy What CA does entes the ow and olumn pofiles with espet to thei aveage pofiles, so that the oigin epesents the aveage. e-defines the dimensions of the spae in an odeed way: fist dimension explains the maximum amount of inetia possible in one dimension; seond adds the maximum amount to fist (hene fist two explain the maximum amount in two dimensions), and so on until all dimensions ae explained. deomposes the total inetia along the pinipal axes into pinipal inetias, usually expessed as % of the total. so if we want a low-dimensional vesion, we ust take the fist (pinipal) dimensions The ow and olumn poblem solutions ae losely elated, one an be obtained fom the othe; thee ae simple saling fatos along eah dimension elating the two poblems. D / XD / Genealized SVD (epeat epeat) We often want to assoiate weights on the ows and olumns, so that the fit is by weighted least-squaes, not odinay least squaes, that is we want to minimize RSS = n p i i= = ( x i x * i ) T T T = UDα V whee U U = V V = I, α α L 0 / X = D UD ( D X* = et... α / V) T Weighted meti multidimensional saling (epeat epeat) Suppose we want to epesent the (ented) ows of a matix Y, weighted by (positive) elements down diagonal of matix D, whee distane between ows is in the (weighted) meti defined by matix D m -. inetia = Σ i Σ q i (/m )y i S = D q½ Y D m ½ = U D α V T whee U T U = V T V = I Pinipal oodinates of ows: F = D q ½ U D α Pinipal axes of the ows: D m½ V Standad oodinates of olumns: G = D m ½ V Vaianes (inetias) explained: λ = α, λ = α,...
Of the ows: Coespondene analysis Y is the ented matix of ow pofiles ow masses in D q ae the elative fequenies of the ows olumn weights in D w ae the inveses of the elative fequenies of the olumns inetia = χ /n Of the olumns: Y is the ented matix of olumn pofiles olumn masses in D q ae the elative fequenies of the olumns ow weights in D w ae the inveses of the elative fequenies of the ows inetia = χ /n Both poblems lead to the SVD of the same matix S Coespondene analysis Table of nonnegative data N Divide N by its gand total n to obtain the so-alled oespondene matix P = (/n) N Let the ow and olumn maginal totals of P be the vetos and espetively, that is the vetos of ow and olumn masses, and D and D be the diagonal maties of these masses / T / = D ( P ) D o equivalently S = D Pinipal oodinates / T / ( D PD ) D F = D / UD α G = D / VD α (to be deived algebaially in lass) p i p i i i Standad oodinates i i / Φ = D U / Γ D V = Deomposition of total inetia along pinipal axes Duality (symmety) of the ows and olumns I ows (smoking I=5) J olumns (smoking J=4) inetia in(i) 0.0859 in(j) 0.0859 Inetia axis λ 0.046 (8.8%) λ 0.046 Senio manages Junio manages Senio employees Junio employees Seetaies no li me hv 4 4 4 5 0 4 8 4 0 6 sum 8 5 88 5 ow pofiles masses.6.8..8.06...9..09.49.0.4.08.6.0..8.5.46.40.4.8.08. Inetia axis λ 0.000 (.8%) λ 0.000 sum 6 45 6 5 ave.... Inetia axis λ 0.0004 ( 0.5%) λ 0.0004 olumn pofiles no li me hv.0.04.05.08.0.0..6.4..9.6.0.5.5.5.6...08 ave.06.09.6.46. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 no li me hv 0 0 0 0 0 0 0 0 0 0 0 0 masses....
Relationship between ow and olumn solutions ows olumns standad oodinates Φ = [ φ ik ] Γ = [ γ k ] pinipal oodinates F = [ f ik ] G = [ g k ] Relationship between ow and olumn solutions Vetex pofiles in standad oodinates λ = 0.046 = 0.65 λ = 0.000 = 0.00 elationships between F = ΦD α G = ΓD α oodinates f ik = α k x ik g k = α k y k whee α k = λ k is the squae oot of the pinipal inetia on axis k pinipal = standad α k Data pofiles in pinipal oodinates standad = pinipal / α k Vetiex pofiles in standad oodinates Data pofiles in pinipal oodinates 0.4 0. 0-0. Symmeti map using XLSTAT heavy Junio Manages 0.0046 (8.8 %) Junio Employees medium light 0.000 (.8 %) Senio Manages Seetaies none Senio Employees -0.4-0. 0 0. 0.4 Summay: Relationship between ow and olumn solutions. same dimensionality (ank) = min{i, J }. same total inetia and same pinipal inetias λ, λ,, on eah dimension (i.e., same deomposition of inetia along pinipal axes), hene same peentages of inetia on eah dimension. same oodinate solutions, up to a sala onstant along eah pinipal axis, whih depends on the squae oot λ k = α k of the pinipal inetia on eah axis: pinipal = standad λ k standad = pinipal / λ k 4. Asymmeti map: one set pinipal, othe standad 5. Symmeti map: both sets pinipal