B. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis

Size: px

Start display at page:

Download "B. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis"

Clarence Chandler
5 years ago
Views:

1 B Weaver (18-Oct-2001) Factor analysis 1 Chapter 7: Factor Analysis 71 Introduction Factor analysis (FA) was developed by C Spearman It is a technique for examining the interrelationships in a set of variables As Ferguson and Takane (1989) put it, concern is with the description and interpretation of interdependencies within a single set of variables The usefulness of this technique becomes apparent when the number of variables under consideration is large For example, Table 71 shows the means and standard deviations for 24 psychological tests that were administered to 145 children; and Table 72 shows a correlation matrix for the 24 tests Clearly, it is very difficult to glean any useful information from inspection of these tables The information needs to be summarized somehow to make it more readily understandable Table 71 Means and standard deviations for 24 psychological tests Standard deviation Test Mean 1 Visual perception Cubes Paper form board Flags General information Paragraph comprehension Sentence completion Word classification Word meaning Addition Code Counting dots Straight-curved capitals Word recognition Number recognition Figure recognition Object-number Number-figure Figure-word Deduction Numerical puzzles Problem reasoning Series completion Arithmetic problems

2 B Weaver (18-Oct-2001) Factor analysis 2 Table 72 Intercorrelations of 24 psychological tests for 145 children Test

3 B Weaver (18-Oct-2001) Factor analysis 3 Tables 71 and 72 are taken from Ferguson and Takane (1989) The results of a factor analysis they carried out on these data are also reproduced in Table 73 They identified 4 underlying factors, which they identified as verbal, speed, deduction, and memory The numbers in the table are factor loadings The factor loadings are correlation coefficients Factor loadings equal to or greater than 4 are shown in bold in Table 73 Thus we can see, for example, that tests 5-9 correlate very highly with the verbal factor, but not with any of the other factors Similarly tests correlate highly with the speed factor, but apart from test 13, their correlations with all other factors are less than 4 And so on The point is that it is much easier to see structure in the data in Table 73 than it is in Tables 71 and 72 Table 73 Normal varimax solution for 24 psychological tests (following initial extraction using principal factor analysis) Test Verbal F 1 Speed F 2 Deduction F 3 Memory F 4 Calculated communality V p Percent Basic concepts of factor analysis Afifi and Clark (1990) do a masterful job of introducing the basic concepts of factor analysis If it were feasible, I would simply ask all of you to read Chapters 14 and 15 of their

4 B Weaver (18-Oct-2001) Factor analysis 4 book However, the library has only one copy, and the book is no doubt quite expensive Therefore, I will quote extensively from certain sections of it The shaded area that follows is Afifi and Clark s Section 154 (pp ): In factor analysis we begin with a set of variables X 1, X 2,, X P These variables are usually standardized by the computer program so that their variances are each equal to one and their covariances are correlation coefficients In the remainder of this chapter we therefore assume that each x i is a standardized variable, ie, x i = (X i Xi ) S i In the jargon of factor analysis the x i s are called the original or response variables The object of factor analysis is to represent each of these variables as a linear combination of a smaller set of common factors plus a factor unique to each of the response variables We express this representation as x 1 = l 11 F 1 + l 12 F l 1m F m + e 1 x 2 = l 21 F 1 + l 22 F l 2m F m + e 2 x P = l P1 F 1 + l P2 F l Pm F m + e P where the following assumptions are made: 1 m is the number of common factors (typically this number is much smaller than P) 2 F 1, F 2,, F m are the common factors These factors are assumed to have zero means and unit variances 3 l ij is the coefficient of F j in the linear combination describing x i This term is called the loading of the ith variable on the jth common factor 4 e 1, e 2,, e P are unique factors, each relating to one of the original variables The above equations and assumptions constitute the so-called factor model Thus each of the response variables is composed of a part due to the common factors and a part due to its own unique factor [emphasis added] The part due to the common factors is assumed to be a linear combination of these factors As an example, suppose that x 1, x 2, x 3, x 4, x 5 are the standardized scores on an individual on five tests If m = 2 [ie, there are 2 common factors], we assume the following model: x 1 = l 11 F 1 + l 12 F 2 + e 1 x 2 = l 21 F 1 + l 22 F 2 + e 2 x 5 = l 51 F 1 + l 52 F 2 + e P

5 B Weaver (18-Oct-2001) Factor analysis 5 Each of the five scores consists of two parts: a part due the common factors F 1 and F 2 and a part due to the unique factor for that test The common factors F 1 and F 2 might be considered the individual s verbal and quantitative abilities respectively [for example] The unique factors express the individual variation on each test score The unique factor includes all other effects that keep the common factors from completely defining a particular x i In a sample of N individuals we can express the equations by adding a subscript to each x i, F j, and e i to represent the individual For the sake of simplicity this subscript was omitted in the model presented here The factor model is, in a sense, the mirror image of the principal components model, where each principal component is expressed as a linear combination of the variables Also, the number of principal components is equal to the number of original variables (although we may not use all of the principal components) [I will discuss principal components analysis in more detail later] On the other hand, in factor analysis we choose the number of factors to be smaller than the number of response variables Ideally, the number of factors should be known in advance, although this is often not the case However, as we will discuss later, it is possible to allow the data themselves to determine this number The factor model, by breaking each response variable x i into two parts, also breaks the variance of x i into two parts [emphasis added] Since x i is standardized, its variance is one This variance of one is composed of the following two parts: 1 The communality, ie, the part of the variance that is due to the common factors 2 The specificity, ie, the part of the variance that is due to the unique factor e i Denoting the communality of x i by h i 2 and the specificity by u i 2, we can write the variance of x i as Var x i = 1 = h i 2 + u i 2 In words, the variance of x i equals the communality plus the specificity 73 Two steps in factor analysis There are two steps in factor analysis: 1) initial factor extraction, and 2) rotation of the factors There are a number of techniques for carrying out each of these steps, and as Tabachnick and Fidell (1989) point out, these may be used in various combinations: None of the extraction techniques routinely provides an interpretable solution without rotation All types of extraction may be rotated by any of the procedures described in Section 1252 [of Tabachnick & Fidell, 1989], except Kaiser's Second Little Jiffy Extraction, which has its own rotational procedure We do not have enough time, nor do I have enough knowledge (but let s not go into that!) to discuss all of the available methods of initial factor extraction and factor rotation Therefore, let us focus on the most common techniques, and attempt to understand the basic principles The two most common factor extraction techniques are principal components analysis (PCA) and principal factor analysis (PFA) (PFA was the method used by Ferguson & Takane to generate the solution shown in Table 73) PCA is a bit easier to understand, so I will start with it I will first discuss PCA as a statistical technique in its own

6 B Weaver (18-Oct-2001) Factor analysis 6 right, and follow that with discussion of how it is used for initial factor extraction in factor analysis 74 Principal Components Analysis Principal components analysis (PCA) is a method for transforming a set of intercorrelated variables into a new set of uncorrelated variables These new, uncorrelated variables are called the principal components Each of the principal components is a linear combination of the original variables According to Afifi and Clark (1990, p 372), One measure of the amount of information conveyed by each principal component is its variance For this reason the principal components are arranged in order of decreasing variance Thus the most informative principal component is the first, and the least informative is the last (a variable with zero variance does not distinguish between members of the population) The number of principal components is equal to the number of original variables However, it is often possible to reduce the number of variables that need to be considered by analyzing only the first few principal components In essence, this allows one to analyze a relatively small number of uncorrelated principal components rather than a large number of original variables with complex interrelationships, and in so doing to explain most (but not all) of the variance in the original set of variables (Afifi & Clark, 1990, p 372) 741 A simple example of Principal Components Analysis Afifi and Clark (1990) introduce the basic concepts of PCA in the following simple (and hypothetical) situation Two variables, X 1 and X 2, are both normally distributed in the population, and we have a sample of N = 100 paired scores The population mean and variance for X 1 are 100 and 100; the population mean and variance for X 2 are 50 and 50 The sample statistics are as shown in Table 74 Finally the population correlation between X 1 and X 2 is 0707, and the sample correlation is r = 757 Table 74 Sample statistics for hypothetical data set Statistic X 1 X 2 N Mean Standard deviation Variance It will make interpretation of the results somewhat easier if we work with deviation scores rather than with the original data Recall that a deviation score is obtained by subtracting the mean from each raw score Thus, we can define variables x 1 and x 2 as follows: x 1 = X 1 X1 (71)

7 B Weaver (18-Oct-2001) Factor analysis 7 x 2 = X 2 X2 (72) Note that the means of x 1 and x 2 will both be equal to zero (because the mean of the deviations about the mean is equal to zero for any distribution); but the sample variances (S 2 1 and S 2 2) and r 12 will not change What we need to do next is create two new variables called C 1 and C 2 These are the principal components They are linear combinations of x 1 and x 2 : C 1 = a 11 x 1 + a 12 x 2 (73) C 2 = a 21 x 1 + a 22 x 2 (74) The coefficients (a 11, a 12, a 21, and a 22 ) are chosen to meet the following three requirements: 1 The variance of C 1 must be as large as possible 2 The N pairs of C 1 and C 2 values must be uncorrelated 3 a a 2 12 = a a 22 = 1 Afifi and Clark (1990) go on to say this: The mathematical solution for the coefficients was derived by Hotelling (1933) The solution is illustrated graphically in Figure 142 [see below] Principal components analysis amounts to rotating the original x 1 and x 2 axes to new C 1 and C 2 axes The angle of rotation is determined uniquely by the requirements just stated For a given point x 1, x 2 (see Figure 142) the values of C 1 and C 2 are found by drawing perpendicular lines to the new C 1 and C 2 axes The N values of C 1 thus obtained will have the largest variance according to requirement 1 The N values of C 1 and C 2 will have a zero correlation C 2 x 2 x1,x2 C 1 x 1 Afifi & Clark Figure 142 Illustration of Two Principal Components C 1 and C 2 For the simple example we have been examining, the two principal components are:

8 B Weaver (18-Oct-2001) Factor analysis 8 C 1 = 0841x x 2 (75) C 2 = 0539x x 2 (76) Note that these coefficients do meet the third requirement listed above That is, apart from some rounding error, = 1, and (-0539) = 1 Note as well that a 11 = a 22, and a 12 = -a 21 Afifi and Clark point out that this is so for the two-variable case only The original x 1 and x 2 axes, and the rotated C 1 and C 2 principal component axes are shown in Figure 143 from Afifi and Clark (1990) The figure also shows an ellipse of concentration of the original bivariate normal distribution (Afifi & Clark, 1990) (The ellipse defines an area within which most of the original x 1, x 2 pairs fall) NORM 2 x 2 C 1 C 2 X2 x 1 X1 NORM 1 Afifi & Clark Figure 143 Plot of Principal Components for Hypothetical Bivariate Data To calculate values of C 1 and C 2 for a given subject, you would have to plug that subject s x 1 and x 2 scores into equations 75 and 76 If you did this for each subject, you would end up with N pairs of C 1 and C 2 scores The means and variances C 1 and C 2 of are as follows: C1 = C 2 = 0 (77) Var C 1 = a 2 11 S a 2 12 S a 11 a 22 r 12 S 1 S 2 (78) Var C 2 = a 2 21 S a 2 22 S a 21 a 22 r 12 S 1 S 2 (79) where S i 2 = the variance of X i For our data, the variance of C 1 is 14744, and the variance of C 2 is 1759 These two variances have special names: They are known as the first and second eigenvalues (You could also see the terms characteristic root, latent root, or proper value used in place of eigenvalue) The sum of variances of C 1 and C 2, or the sum of the eigenvalues, is equal to Note that the sum of the variances of the original variables (x 1 and x 2 ) is also equal to 16503: = According to Afifi and Clark (1990), this result will always be the

9 B Weaver (18-Oct-2001) Factor analysis 9 case; ie, the total variance is preserved under rotation of the principal components Afifi and Clark also point out that: The lengths of the axes of the ellipse of concentration (Figure 143) are proportional to the sample standard deviations of C 1 and C 2, respectively These standard deviations are the square roots of the eigenvalues It is therefore easily seen from Figure 143 that C 1 has a larger variance than C 2 In fact, C 1 has a larger variance than either of the original variables x 1 and x 2 In other words, the total variance remains the same, but it is partitioned in a different way In the original variables, the variance of x 1 accounted for 6643% of the total variance, and x 2 accounted for the remaining 3357% But when you look at the principal components, C 1 accounts for 8934% of the total variance, and C 2 for only 1066% 742 PCA with more than 2 variables The basic principles we have looked at in a simple two-variable context can be extended to the case of P variables As in the simple case, each principal component (C i ) is a linear combination of variables x 1, x 2, x P And according to Afifi and Clark (1990, p 378), the following must be true: 1 Var C 1 Var C 2 Var C P 2 The values of any two principal components are uncorrelated 3 For any principal component the sum of the squares of the coefficients is one As in the two-variable case, the variances of the principal components (Var C i ) are the eigenvalues The sum of the eigenvalues is equal to the total variance in the original variables (ie, the sum of the variances of the original variables) Finally, you may also encounter the term eigenvector The ith eigenvector is just the set of coefficients of the linear combination for the ith principal component It is also known as the characteristic vector, or latent vector 743 Principal Components Analysis using Standardized Variables It is often preferable to standardize the x variables before carrying out principal components analysis (You may recall that to standardize a variable, you must subtract the mean from each score, and divide the difference by the standard deviation The x variables are already deviation scores, so all that we need to do to standardize them is divide each one by its standard deviation) According to Afifi and Clark (1990, p 380), This analysis [using standardized variables] is then equivalent to analyzing the correlation matrix instead of the covariance matrix When we derive the principal components from the correlation matrix, the interpretation becomes easier in two ways: 1 The total variance is simply the number of variables P, and the proportion explained by each principal component is the corresponding eigenvalue divided by P 2 The correlation between the i th principal component C i and the j th variable x j is

10 B Weaver (18-Oct-2001) Factor analysis 10 r ij = a ij Var _C i Therefore for a given C i we can compare the a ij to quantify the relative degree of dependence of C i on each of the standardized variables Finally, it is important to note that the covariance matrix and the correlation matrix yield different solutions Furthermore, according to Afifi and Clark (1990), there is no easy way to convert the results based on the covariance matrix into those based on the correlation matrix They also suggest that use of the correlation matrix is preferred because all variables are measured in standard deviation units But if it is used, they caution, then all interpretations must be made in terms of the standardized variables 744 Reduction of dimensionality Principal components analysis can be useful in its own right, not just as a first step (ie, initial factor extraction) in factor analysis A major reason for using PCA is that it may allow reduction of dimensionality That is, if the original variables are not all mutually independent, one can probably reduce the problem from dealing with a large number of interrelated variables to dealing with a relatively small number of uncorrelated principal components (Remember that the principal components are just linear combinations of the original variables) The question is how many of the principal components should be retained for further analysis? There are no hard and fast rules to apply here, only guidelines One rule of thumb that can be used for either the correlation matrix or the covariance matrix is to omit those principal components that do not explain at least 100/P percent of the total variance (Afifi & Clark, 1990, p 385) For example, if you had P = 20 variables, you would omit those principal components that do not explain at least 5% of the total variance If you are using the correlation matrix, recall that the total variance is equal to the number of variables, P; and the variance of the i th principal component (Var C i ) is the eigenvalue for that principal component Therefore, the percentage of the total variance explained by C i is equal to the eigenvalue for C i times 100 divided by P: % total variance explained by C i = (i th eigenvalue)(100)/p (710) where P = the number of standardized x variables If you use this method for deciding how many principal components to retain, bear in mind these words of caution from Afifi and Clark (1990, p 385): It should be explained that eigenvalues are estimated variances of the principal components and are therefore subject to large sample variations Arbitrary cutoff points should thus not be taken too seriously Ideally, the investigator should make the selection of the number of principal components on the basis of some underlying theory 75 Initial factor extraction using PCA Section 155 from Afifi and Clark (1990) gives a very clear explanation of how initial factor extraction may be carried out using principal components analysis They use a hypothetical set of data with 5 variables (X 1, X 2, X 3, X 4, X 5 ), and N = 100 observations The correlation matrix for these data is shown in Table 75

11 B Weaver (18-Oct-2001) Factor analysis 11 Table 75 Correlation matrix for 100 hypothetical data points X 1 X 2 X 3 X 4 X 5 X X X X X Afifi and Clark (1990) note that there exists a high correlation between X 4 and X 5, a moderately high correlation between X 1 and X 2, and a moderate correlation between X 3 and each of X 4 and X 5 Bear this in mind when we get to the final solution later The following shaded block is taken directly (with some minor omissions) from Section 155 of Afifi and Clark (1990) The basic idea is to choose the first m principal components and modify them to fit the factor model defined [earlier] The reason for choosing the first m principal components, rather than any others, is that they explain the greatest proportion of variance and are therefore the most important Note that the principal components are also uncorrelated and thus present an attractive choice as factors To satisfy the assumption of unit variances of the factors, we divide each principal component by its standard deviation That is, we define the jth common factor F j as F j = C j /(Var C j ) 1/2, where C j is the jth principal component [Note that (Var C j ) 1/2 = the square root of (Var C j ), or the standard deviation of C j ] To express each variable x i in terms of the F j s, we first recall the relationship between the variables x i and the principal components C j Specifically, C 1 = a 11 x 1 + a 12 x a 1P x P C 2 = a 21 x 1 + a 22 x a 2P x P C P = a P1 x 1 + a P2 x a PP x P It may be shown mathematically that this set of equations can be inverted to express the x i s as functions of the C j s The result is (see Harmon 1976 or Afifi and Azen 1979): x 1 = a 11 C 1 + a 21 C a P1 C P x 2 = a 12 C 1 + a 22 C a P2 x P x P = a 1P C 1 + a 2P C a PP x P

12 B Weaver (18-Oct-2001) Factor analysis 12 Note that the rows of the first set of equations become the columns of the second set of equations Now since F j = C j /(Var C j ) 1/2, it follows that C j = F j (Var C j ) 1/2, and we can then express the ith equation as x i = a 1i F 1 (Var C 1 ) 1/2 + a 2i F 2 (Var C 2 ) 1/2 + + a Pi F P (Var C P ) 1/2 This last equation is now modified in two ways: 1 We use the notation l ij = a ji (Var C j ) 1/2 for the first m components 2 We combine the last P-m terms and denote the result by e i With these manipulations we have now expressed each variable x i as x i = l i1 F 1 + l i2 F l im F m + e i for i = 1,2,, P In other words, we have transformed the principal components model to produce the factor model For later use, note also that when the original variables are standardized, the factor loading l ij turns out to be the correlation between x i and F j (see Section 145) The matrix of factor loadings is sometimes called the pattern matrix When the factor loadings are correlations between the x i s and F j s as they are here, it is also called the factor structure matrix (see Dillon and Goldstein 1984) Furthermore, it can be shown mathematically that the communality of x i is h i 2 = l i1 2 + l i l im 2 For example, in our hypothetical data set the eigenvalues of the correlation matrix (or Var C i ) are Var C 1 = 2578 Var C 2 = 1567 Var C 3 = 0571 Var C 4 = 0241 Var C 5 = total = 5000 Note that the sum 50 is equal to P, the total number of variables Based on the rule of thumb of selecting only those principal components corresponding to eigenvalues of one or more, we select m = 2 Using BMDP4M [a stats package], we obtain the principal components analysis factor loadings, the l ij s shown in Table 153 For example, the loading of x 1 on F 1 is l 11 = 0511 and on F 2 is l 12 = 0782 Thus the first equation of the factor model is x 1 = 0511F F 2 + e 1 Table 153 also shows the variance explained by each factor For example, the variance explained by F 1 is 2578, or 516% of the total variance of 5

13 B Weaver (18-Oct-2001) Factor analysis 13 TABLE 153 Initial Factor Analysis Summary for Hypothetical Data Set from Principal Components Extraction Method Variable Factor F 1 Loadings F 2 Communality 2 h i Specificity 2 u i x x x x x Variance Explained h 2 i = 4145 u 2 i = 0855 Percentage 516% 313% 829% 171% The communality column, h i 2 in Table 153 shows the part of the variance of each variable explained by the common factors For example, h 1 2 = = 0873 Finally, the specificity u i 2 is the part of the variance [of x i ] not explained by the common factors In this example we use standardized variables x i whose variances are equal to one Therefore for each x i, u i 2 = 1 - h i 2 A valuable graphical aid to interpreting the factors is the factor diagram shown in Figure 151 [see below] Unlike the usual scatter diagrams where each point represents an individual case and each axis a variable, in this graph each point represents a response variable [x i ] and each axis a common factor For example, the point labeled 1 represents the response variable x 1 The coordinates of that point are the loadings of x 1 on F 1 and F 2, respectively The other four points represent the remaining four variables Since the factor loadings are correlations between the standardized variables and the factors, the range of values shown on the axes of the factor diagram is -1 to +1 It can be seen from Figure 151 that x 1 and x 2 load more on F 2 than they do on F 1 Conversely, x 3, x 4, and x 5 load more on F 1 than on F 2 However, these distinctions are not very clear-cut, and the technique of factor rotation will produce clearer results (see Section 157) [emphasis added]

14 B Weaver (18-Oct-2001) Factor analysis Factor Factor 1 10 Afifi & Clark Figure 151 Factor Diagram for Principal Compenents Extraction Method 76 Rotation of the extracted factors The factors obtained via initial factor extraction are almost always difficult to interpret, no matter which method of initial factor extraction one uses Take the data in Figure 151, for example It is clear that variables 1 and 2 are more highly correlated with Factor 2 than with Factor 1 (Another way of saying this is that they load more heavily on Factor 2 than they do on Factor 1) Note however that the difference between these correlations (or factor loadings) is not that great For variable 1, the two factor loadings are 0511 and 0782; and for variable 2, 0553 and 0754 Similarly, it is clear that variables 3, 4, and 5 load more heavily on Factor 1 than on Factor 2 Again though, the difference in factor loadings for the two factors is not very big for any of these three variables (see Table 153 from Afifi and Clark) Interpretation would be much easier if each of the variables had a high loading on only one of the two factors, and a very low loading on the other factor Rotation of the factors is carried out in an effort to bring about exactly this situation, which is often referred to as simple structure There are many factor rotation techniques to choose from As I suggested earlier, I have neither the time nor the expertise to discuss all of these I must, however, draw your attention to the important distinction between orthogonal and oblique rotations Figure 153 (from Afifi & Clark, 1990) shows one orthogonal rotation solution (the Varimax Orthogonal Rotation solution) for the data shown in Figure 151; and Figure 155 shows one oblique rotation solution (the Direct Quartimin Oblique Rotation solution) for the same data Before going on to the differences between these solutions, note the similarities In both cases, variables 1 and 2 have very high loadings on Factor 2, and very low loadings

15 B Weaver (18-Oct-2001) Factor analysis 15 on Factor 1; and variables 3, 4, and 5 have very high loadings on Factor 1, and very low loadings on Factor 2 Therefore, both solutions have achieved the goal of factor rotation, and made interpretation of the factors easier For our purposes, the important difference between orthogonal and oblique rotation techniques is that the angle between the rotated factors must be 90 in the former case, but not in the latter (In geometry, two lines that intersect are said to be oblique if the angle of intersection is not a right angle) You may recall that orthogonal means independent, or uncorrelated What you may not realize is that when two Factors are plotted on X-Y axes, the correlation between them is equal to the cosine of the angle between their axes Note that cos 90 = 0 Thus, when you use an orthogonal rotation technique, the rotated factors, like the initially extracted factors, are all mutually independent When you use an oblique rotation technique, on the other hand, you relax this restriction, and allow the factors to be correlated with each other For the oblique solution shown in Figure 155, for example, the correlation between Factors 1 and 2 is equal to cos 805 = 0165 Both types of solutions have certain advantages You may recall from earlier parts of the course that orthogonal variance components add up neatly to a total, and that this is not so when the components are not orthogonal On the other hand, oblique solutions offer more flexibility, which can be desirable in certain situations Factor Rotated Factor Factor 1 Rotated Factor 1 10 Afifi & Clark Figure 153 Varimax Orthogonal Rotation for Factor Diagram of Figure 151: Rotated axes go through clusters of variables and are perpendicular to each other

16 B Weaver (18-Oct-2001) Factor analysis Factor Rotated Factor Factor 1 Rotated Factor 1 10 Afifi & Clark Figure 155 Direct Quartimin Oblique Rotation for Factor Diagram of Figure 151: Rotated axes go through clusters of variables but are not required to be perpendicular to each other 77 Principal Factor Analysis Earlier I mentioned that there are two main methods of initial factor extraction, principal components analysis (PCA) and principal factor analysis (PFA) PCA has already been explained PFA is similar to PCA, but the solution is iterative In fact, Afifi and Clark (1990) call PFA the iterated principal components method They go on to explain how it works as follows: To understand this method [PFA], you should recall that the communality is the part of the variance of each variable associated with the common factors The principle underlying the iterated solution states that we should perform the factor analysis by using the communalities in place of the original variance This principle entails substituting communality estimates for the 1 s representing the variances of the standardized variables along the diagonal of the correlation matrix With 1 s in the diagonal we are factoring the total variance of the variables; with communalities in the diagonal we are factoring the variance associated with the common factors [emphasis added] [Conversely, we are excluding that part of the variance that cannot be explained by the common factors] Thus with communalities along the diagonal we select those common factors that maximize the total communality Many factor analysts consider maximizing the total communality a more attractive objective than maximizing the total proportion of explained variance, as is done in the principal components method The problem is that communalities are not known before the factor analysis is performed [emphasis added] Some initial estimates of the

17 B Weaver (18-Oct-2001) Factor analysis 17 communalities must be obtained prior to the analysis Various procedures exist, and we recommend, in the absence of a priori estimates, that the investigator use the default option in the particular program since the resulting factor solution is usually little affected by the initial communality estimates The steps performed by a packaged program in carrying out the iterated factor extraction are summarized as follows: 1 Find the initial communality estimates 2 Substitute the communalities for the diagonal elements (1 s) in the correlation matrix 3 Extract m principal components from the modified matrix 4 Multiply the principal components coefficients by the standard deviation of the respective principal components to obtain factor loadings 5 Compute new communalities from the computed factor loadings 6 Replace the communalities in step 2 with these new communalities and repeat steps 3, 4, and 5 This step constitutes an iteration 7 Continue iterating, stopping when the communalities stay essentially the same in the last two iterations The initial factor extraction using PFA is summarized in Table 154 (from Afifi and Clark) If you compare these results to those obtained using PCA (see Table 153), you will note that the total communality is lower for this solution (746% versus 829%) According to Afifi and Clark (p 406), This result is generally the case since in the iterative method we are factoring the total communality, which is by necessity smaller than the total variance TABLE 154 Initial Factor Analysis Summary for Hypothetical Data Set from Iterated Principal Factor Extraction Method Variable Factor F 1 Loadings F 2 Communality 2 h i Specificity 2 u i x x x x x Variance Explained h 2 i = 3730 u 2 i = 1270 Percentage 483% 263% 746% 254% Finally, note that the final rotated solutions using the (orthogonal) varimax procedure are also very similar no matter whether initial factor extraction is carried out with principal components analysis (Figure 153) or principal factor analysis (Figure 154)

18 B Weaver (18-Oct-2001) Factor analysis Factor Rotated Factor Factor 1 Rotated Factor 1 10 Afifi & Clark Figure 153 Varimax Orthogonal Rotation for Factor Diagram of Figure How many factors should be rotated? One of the questions that faces any factor analysts is how many factors ought to be rotated Under ideal circumstances, the number of factors is dictated by theory When this is not so, one must use the data themselves to determine the number of factors Unfortunately, there are no hard and fast rules to apply in attempting to determine this number, just as there are no hard and fast rules for deciding how many principal components to retain for further analysis in PCA (see Section 744) As in that case, there are only guidelines and rules of thumb One commonly used guideline is to use only those factors whose eigenvalues are equal to or greater than 1 According to Afifi and Clark (1990), this method yields roughly one factor for every three to five variables They go on to say: It appears to correctly estimate the number of factors when the communalities are high and the number of variables is not too large If the communalities are low, then the number of factors obtained with a cutoff equal to the average communality tends to be similar to that obtained by the principals components method with a cutoff of 1 According to Cattell (1978), however, this method severely overestimates the number of factors in large matrices An alternative, and fairly common method for determining the number of factors to rotate is Cattell s Scree method Kline (1994) certainly endorses the Scree method According to him, There is now agreement among most factor analysts of any repute that Cattell s Scree test is just about the best solution to selecting the correct number of factors

19 B Weaver (18-Oct-2001) Factor analysis 19 (Does this means that anyone who dares to disagree with Kline is a disreputable factor analyst?!) To carry out the Scree test, one must make a Scree plot You can do this by plotting the principal component numbers (on the X axis) against their eigenvalues (on the Y axis) Figure 54 from Kline (1994) shows a typical Scree plot (see next page) Kline says that the cutoff point for factor rotation is where the line changes slope ; and he suggests that in this case, 5 factors would be rotated Note that if you were to apply the first rule of thumb, and take those principal components with eigenvalues equal to or greater than 1, you would rotate 7, or maybe even 8 factors In this example therefore, Cattell s warning about overestimation of the number of factors appears to be borne out 7 Figure 54 (from Kline, 1994) A Scree test 6 Eigenvalues Slope Change Factors 79 Criticisms of Factor Analysis Factor analysis has been criticized for several reasons One criticism is that it lacks rigor, because many of the decisions that are made along the way (eg, how many factors to rotate, whether to use orthogonal or oblique rotation, etc) are completely arbitrary Perhaps an even more serious criticism that there are an infinite number of ways to rotate the initially extracted factors--and that all of these solutions are equivalent from a purely mathematical point of view For this reason, the results of factor analytic studies have been very difficult to replicate In response to these kinds of criticisms, Kline (1994, p 182) says this: As was argued [earlier], rotation to simple structure usually yields replicable factors and the technical rules for reaching simple structure have been explicated [emphasis added] [A matrix has simple structure when each variable has moderate to high loadings on only one or two factors, and very low loadings on the other factors] In exploratory factor analysis simple structure and factor replicability is the answer to the problem of the indeterminacy of factor analysis An infinity of solutions there may be, but the simple structure solution is best [emphasis added] Certainly, as Cattell

20 B Weaver (18-Oct-2001) Factor analysis 20 (1978) has demonstrated, failure to reach simple structure is the main cause of discrepant results in the field of personality testing Thus it is essential to reach simple structure, and this can easily be done by following the technical rules for adequate factor analyses 710 Miscellaneous points that didn t fit anywhere else! As we have seen, factor analysis and principal components analysis are closely related techniques Note however that both of these techniques differ from regression analysis in that we do not have a dependent variable to be explained by a set of independent variables (Afifi & Clark, 1990, p 396) Rather, principal components analysis is concerned with explaining as much of the total variance as possible; and factor analysis is concerned primarily with explaining the interrelationships amongst the original variables Finally, I should point out the possibility of computing factor scores for individuals For example, if the two factors are labeled Verbal and Quantitative, you may want to have two scores for each individual that reflect their scores on these factors For more information on how to go about doing this, see Section 158 in Afifi and Clark (1990)

21 B Weaver (18-Oct-2001) Factor analysis 21 Review Questions 1 What are the two main goals of factor analysis? 2 What are common factors? 3 What are unique factors? 4 Define communality and specificity 5 What is a principal component? 6 What is an eigenvalue? 7 There are two steps in factor analysis What are they? 8 What is a Scree plot? How is one used to choose the number of factors to rotate? 9 What is the difference between orthogonal and oblique rotation? 10 One of the major criticisms of factor analysis is that there are an infinite number of solutions to any problem, all of which are equally valid from a mathematical point of view A defense against this criticism is that some solutions are better (or simpler) than others Explain what is meant by a good solution [HINT: Think about simple structure]

Principal Component Analysis & Factor Analysis. Psych 818 DeShon

Principal Component Analysis & Factor Analysis Psych 818 DeShon Purpose Both are used to reduce the dimensionality of correlated measurements Can be used in a purely exploratory fashion to investigate