On the Variance of Eigenvalues in PCA and MCA

Size: px

Start display at page:

Download "On the Variance of Eigenvalues in PCA and MCA"

Peregrine Walters
6 years ago
Views:

1 On the Variance of in PCA and MCA Jean-Luc Durand Laboratoire d Éthologie Expérimentale et Comparée - LEEC EA 4443 Université Paris 13 Sorbonne Paris Cité Naples, CARME /09/2015 J.-L. Durand On the Variance of 1/41 CA, PCA and MCA Which statistics measure the overall magnitude of the relations between variables in CA, PCA and MCA? the sum of the eigenvalues in CA the variance of the eigenvalues in PCA and MCA. J.-L. Durand On the Variance of 2/41

2 Part 1 Principal Component Analysis (PCA on correlation matrix) Overall viewpoint: correlations and eigenvalues Local viewpoint: contributions of each variable to axes J.-L. Durand On the Variance of 3/41 PCA Measure of Overall Magnitude of Correlations Definition The average linkage index of p numerical variables is: ALI = k k k r 2 kk p(p 1) (mean of the squared correlations between variables). 0 min (k,k ) r 2 kk ALI max r (k,k kk 2 k) 1 J.-L. Durand On the Variance of 4/41

3 PCA Variance of Variance of the p eigenvalues: Var(λ) = 0 Var(λ) p 1 p (λ l 1) 2 /p l=1 Two extreme situations: p uncorrelated variables r kk = 0 (k k ) =... = λ p = 1 Var(λ) = 0 Spherical cloud p perfectly correlated variables rkk 2 = 1 = p, λ 2 =... = λ p = 0 Var(λ) = p 1 Unidimensional cloud J.-L. Durand On the Variance of 5/41 PCA Theorem In PCA of p standardized variables: Var(λ) = (p 1) ALI The Average Linkage Index is a measure: of the overall magnitude of correlations of the departure from sphericity of the cloud. J.-L. Durand On the Variance of 6/41

4 Spearman Example Correlations SPEARMAN C., General Intelligence, Objectively Determined and Measured, The American Journal of Psychology, Vol. 15, No. 2, April 1904 (p. 291). Correlation matrix Lit Fre Eng Mat Aud Mus Literature French English Mathematics Aud. disc Music 1 ALI =.3899 ALI =.62 J.-L. Durand On the Variance of 7/41 Spearman Example = λ 2 = λ 3 = λ 4 = λ 5 = λ 6 = Sum 6 λ 6 λ 5 λ 4 λ 3 λ Var(λ) = = 5 ALI J.-L. Durand On the Variance of 8/41

5 PCA Linkage Index of a Variable Definition Given p numerical variables, the linkage index of variable k LI k = k k r 2 kk p 1 (mean of the squared correlations between variable k anf the others). 0 min k is: r 2 kk LI k max k k r 2 kk 1 LI k Lit.5240 Fre.4668 Eng.4037 Mat.3622 Aud.3024 Mus.2804 ALI.3899 Mean: LI k k p = ALI J.-L. Durand On the Variance of 9/41 PCA Linkage Ratio of a Variable Definition The linkage ratio of variable k is: LR k = LI k ALI Mean: LR k k p = 1 LR k Lit 1.34 Fre 1.20 Eng 1.04 Mat 0.93 Aud 0.78 Mus 0.72 Mean 1 J.-L. Durand On the Variance of 10/41

6 Spearman Example plane 1-2 Regression coefficients: r kl Axis 1 Axis 2 Literature French English Mathematics Aud. disc Music λ l l (1,..., p) k (1,..., p) rkl 2 = λ l k rkl 2 = l Axis 2 (10.3%) λ 2 = Correlation Circle (plane 1 2) = Axis 1 (68.4%) Aud Fre Eng Lit Mat Mus J.-L. Durand On the Variance of 11/41 Spearman Example Contributions to Axes Contributions of variables to axes (Ctr l k ) Axis 1 Axis 2 Axis 3 Axis 4 Axis 5 Axis 6 Sum Literature French English Mathematics Aud. disc Music Sum Eigenvalue Contribution of variable k to axis l: Ctr l k = rkl 2 /λ l l Ctr l k = 1 k l Ctrl k = 1 k w k : vector of the contributions of variable k to axes k Corr(w k, λ) = 0 J.-L. Durand On the Variance of 12/41

7 Property of Uncorrelated Positive Variables Let x and y be two positive variables with n values, Mean of x: xi n, denoted x Weighted average of x by y: ( y-average of x ) yi x i yi, denoted Avg y (x) Property r xy = 0 x = Avg y (x) ȳ = Avg x (y) In PCA on correlation matrix, k λ = 1 = Avgwk (λ) J.-L. Durand On the Variance of 13/41 Expansion Ratio Let x and w be two uncorrelated positive variables, (xi x) 2 Variance of x:, denoted Var(x) n wi (x i x) 2 Weighted variance of x by w:, denoted Var w (x) wi ( w-variance of x ) Definition The w-expansion ratio of x is: XR w (x) = Var w(x) Var(x) J.-L. Durand On the Variance of 14/41

8 PCA Theorem In PCA of p standardized variables: k Var wk (λ) = (p 1) LI k XR wk (λ) = LR k Interpretation The higher the linkage ratio of a variable, the higher the contributions of this variable to extreme eigenvalues. The lower the linkage ratio of a variable, the higher the contributions of this variable to central eigenvalues. J.-L. Durand On the Variance of 15/41 Spearman Example Literature Lit Linkage Ratio XR = 1.34 Contributions of Lit to axes (%) λ 2 λ 4 λ 3 λ 6 λ J.-L. Durand On the Variance of 16/41

9 Spearman Example French Fre Linkage Ratio XR = 1.20 Contributions of Fre to axes (%) λ 2 λ 4 λ 3 λ 6 λ J.-L. Durand On the Variance of 17/41 Spearman Example English Eng Linkage Ratio Contributions of Eng to axes (%) λ 2 λ 4 λ 3 λ 6 λ 5 XR = J.-L. Durand On the Variance of 18/41

10 Spearman Example Mathematics Mat Linkage Ratio Contributions of Mat to axes (%) λ 2 λ 4 λ 3 λ 6 λ 5 XR = J.-L. Durand On the Variance of 19/41 Spearman Example Auditive discrimination Aud Linkage Ratio Contributions of Aud to axes (%) λ 2 λ 4 λ 3 λ 6 λ 5 XR = J.-L. Durand On the Variance of 20/41

11 Spearman Example Music Mus Linkage Ratio Contributions of Mus to axes (%) λ 2 λ 4 λ 3 λ 6 λ 5 XR = J.-L. Durand On the Variance of 21/41 PCA Summary In PCA on correlation matrix: The variance of eigenvalues is proportional to the average linkage index. The distribution of contributions of a variable to axes depends on the linkage ratio of this variable. J.-L. Durand On the Variance of 22/41

12 Part 2 Multiple Correspondence Analysis Overall viewpoint: Φ 2 coefficients and eigenvalues Local viewpoint: contributions of each question to axes J.-L. Durand On the Variance of 23/41 MCA Burt s Data BURT C., The Factorial Analysis of Qualitative Data, British Journal of Statistical Psychology, Vol. 3, Issue 3, November 1950 (p. 177). Hair Eyes Head Stature Hair Eyes Head Stature fair red dark light mixed brown narrow wide tall short fair red dark light mixed brown narrow wide tall 43 0 short 0 57 Q = 4 questions K = K q = = 10 categories Average number of categories per question: K Q = 2.5 J.-L. Durand On the Variance of 24/41

13 MCA Φ 2 coefficients Φ 2 table Hair Eyes Head Stature Hair Eyes Head Stature 1 J.-L. Durand On the Variance of 25/41 MCA Magnitude of Relationships between Questions Definition The average linkage index of Q questions with K categories is: ALI = with Φ 2 = q K Φ2 Q 1 q q Φ2 qq Q(Q 1) (mean of the Φ 2 qq coefficients (q q )). 0 ALI 1 Burt example: Φ 2 = , ALI = = J.-L. Durand On the Variance of 26/41

14 MCA Variance of = λ 2 = λ 3 = λ 4 = λ 5 = λ 6 = λ l = K Q 1 = 1.5 l Mean(λ) = 1 Q = 0.25 Var(λ) = λ 6 λ 5 λ 4 λ 3 λ Var(λ) = K Q l=1 ( ) 2 λ l 1 Q K Q J.-L. Durand On the Variance of 27/41 MCA Variance of 0 Var(λ) Q 1 Q 2 Two extreme situations: Q independent questions Φ 2 qq = 0 (q q ) =... = λ p = 1 Q Var(λ) = 0 Spherical cloud Q equivalent questions Φ 2 qq = K q 1 = K q 1 { 1 if 1 l K λ l = Q 1 0 if K Q l K Q ( K Q Var(λ) = Q 1 Q 2 1)-dimensional cloud J.-L. Durand On the Variance of 28/41

15 MCA Theorem In MCA on a table with Q questions: Var(λ) = Q 1 Q 2 ALI The Average Linkage Index is a measure: of the overall magnitude of relations between questions of the departure from sphericity for the cloud. J.-L. Durand On the Variance of 29/41 MCA Linkage Index of a Question Definition Given Q questions, the linkage index of a question q with K q categories is: LI q = Φ2 q K q 1 with Φ 2 q q = q Φ2 qq Q 1 (mean the Φ 2 coefficients between question q and the others). 0 LI q 1 Average of linkage indexes weighted by number of categories minus 1 q (K q 1)LI q = ALI K Q J.-L. Durand On the Variance of 30/41

16 MCA Linkage Indexes of Questions Φ 2 table Hair Eyes Head Stature Φ 2 q K q 1 LI q Hair Eyes Head Stature Average of linkage indexes weighted by number of categories minus 1 ALI = J.-L. Durand On the Variance of 31/41 MCA Linkage Ratios of a Question Definition The linkage ratio of question q is: LR q = LI q ALI Average of linkage ratios weighted by number of categories minus 1 q (K q 1)LR q = 1 K Q J.-L. Durand On the Variance of 32/41

17 MCA Linkage Ratios of Questions LR q K q 1 Hair Eyes Head Stature Weighted average 1 Head Hair Eyes Stature Linkage Ratio J.-L. Durand On the Variance of 33/41 MCA Contributions to Axes Contributions of questions to axes (Ctr l q) Axis 1 Axis 2 Axis 3 Axis 4 Axis 5 Axis 6 Sum Hair Eyes Head Stature Sum Eigenvalue l, Ctr l q = 1 q q, l Ctrl q = K q 1 w q : vector of the contributions of question q to axes q, Corr(w q, λ) = 0 J.-L. Durand On the Variance of 34/41

18 MCA Theorem In MCA on a table with Q questions: q Var wq (λ) = Q 1 Q 2 LI q XR wq (λ) = LR q Interpretation The higher the linkage ratio of a variable, the higher the contributions of this variable to extreme eigenvalues. The lower the linkage ratio of a variable, the higher the contributions of this variable to central eigenvalues. J.-L. Durand On the Variance of 35/41 Burt Example Stature Stature Expansion Ratio XR = 1.95 Ctr of Stature to axes (%) λ 6 λ 5 λ 4 λ 3 λ J.-L. Durand On the Variance of 36/41

19 Burt Example Eyes Eyes Expansion Ratio XR = 1.32 Ctr of Eyes to axes (%) λ 6 λ 5 λ 4 λ 3 λ J.-L. Durand On the Variance of 37/41 Burt Example Hair Hair Expansion Ratio XR = 0.59 Ctr of Hair to axes (%) λ 6 λ 5 λ 4 λ 3 λ J.-L. Durand On the Variance of 38/41

20 Burt Example Head Head Expansion Ratio XR = 0.23 Ctr of Head to axes (%) λ 6 λ 5 λ 4 λ 3 λ J.-L. Durand On the Variance of 39/41 MCA In MCA: Summary The variance of eigenvalues is proportional to the average linkage index. The distribution of contributions of a question to axes depends on the linkage ratio of this question. J.-L. Durand On the Variance of 40/41

21 Suggestions in PCA results: report ALI and LIs or LRs of variables. in MCA results: report Φ 2 table, ALI and LIs or LRs of questions. in agglomerative hierarchical clustering of variables: use the within-class ALI as an aggregation index. J.-L. Durand On the Variance of 41/41

Statistical View of Least Squares

Statistical View of Least Squares Basic Ideas Some Examples Least Squares May 22, 2007 Basic Ideas Simple Linear Regression Basic Ideas Some Examples Least Squares Suppose we have two variables x and y Basic Ideas Simple Linear Regression