Introduction to Factor Analysis

to Factor Analysis Lecture 11 November 2, 2005 Multivariate Analysis Lecture #11-11/2/2005 Slide 1 of 58

Today s Lecture Factor Analysis. Today s Lecture Exploratory factor analysis (EFA). Confirmatory factor analysis (CFA). How to do an EFA or CFA. Comparisons to PCA. Reminder: Homework #6 due this Friday. Next homework (#7): Posted on Friday, due next Friday (November 11th). Lecture #11-11/2/2005 Slide 2 of 58

History of Factor Analysis Factor analysis has a long history in psychology, dating back to the work of Charles Spearman and Karl Pearson in the early 1900 s. History Intelligence Assumptions Common Factor Model EFA or CFA? Spearman Guinness Lecture #11-11/2/2005 Slide 3 of 58

History of Factor Analysis Spearman s seminal 1904 paper has shaped psychology as a science. History Intelligence Assumptions Common Factor Model EFA or CFA? Spearman, C. (1904). General intelligence objectively determined and measured. American Journal of Psychology, 15, 201-293. Notable quotations from this paper: Most of those hostile to Experimental Psychology are in the habit of reproaching its methods with insignificance, and even with triviality...they protest that such means can never shed any real light upon the human soul, unlock the eternal antinomy of Free Will, or reveal the inward nature of Time and Space. (p. 203) The present article, therefore, advocates a Correlational Psychology. (p. 205) Lecture #11-11/2/2005 Slide 4 of 58

Measurement of Intelligence The idea that Spearman was pursuing with his work was a way to pin down intelligence. History Intelligence Assumptions Common Factor Model EFA or CFA? At the time, Psychologists had thought that intelligence could be defined by a single, all-encompassing unobservable entity, called g (for general intelligence). In his paper, Spearman sought to describe the influence of g on examinee s test scores on several domains: Pitch Light Weight Classics French English Mathematics In reality, g may or may not exist, but postulating g provides a mechanism to detect common correlations among such variables. Lecture #11-11/2/2005 Slide 5 of 58

Measurement of Intelligence The model proposed by Spearman was very similar to a linear regression model: History Intelligence Assumptions Common Factor Model EFA or CFA? X i1 = µ 1 + λ 1 g i + ǫ i1 X i2 = µ 2 + λ 2 g i + ǫ i2.. X ip = µ p + λ p g i + ǫ ip Here: X ij is the score of examinee i (i = 1,...,n) on test domain j (j = 1,...,p). µ j is the mean of test domain j. g i is the value of the intelligence factor for person i. λ j is the loading of test domain j onto the general ability factor g. ǫ ij is the random error term for person i and test domain j. Lecture #11-11/2/2005 Slide 6 of 58

Spearman Model Assumptions The Spearman factor model has the following assumptions: History Intelligence Assumptions Common Factor Model EFA or CFA? E(g) = µ g = 0 Var(g) = 1 E(ǫ ip ) = µ ǫip = 0 for all i and p. Var(ǫ) = φ Although the Spearman model is very similar to a regression model (just replace g with an observed variable), estimation of the model cannot proceed like regression because g is not observed (we will get to model estimation shortly). Note that the Spearman model is still very much in effect today with it being the basis for Item Response Theory (IRT) - methods used to estimate ability from test scores (and how you received your score on the GRE). Lecture #11-11/2/2005 Slide 7 of 58

Common Factor Model History Intelligence Assumptions Common Factor Model EFA or CFA? As theories of intelligence began to change from generalized ability to specialized ability, the Spearman model with a single latent construct became less popular. In the 1930 s, L. L. Thurstone developed the common factor model. The common factor model posited that scores were a function of multiple latent variables, variables that represented more specialized abilities. Louis Leon Thurstone Lecture #11-11/2/2005 Slide 8 of 58

Common Factor Model The Common Factor Model was also very similar to a linear multiple regression model: History Intelligence Assumptions Common Factor Model EFA or CFA? Where: X i1 = µ 1 + λ 11 f i1 + λ 12 f i2 +... + λ 1m f im + ǫ i1 X i2 = µ 2 + λ 21 f i1 + λ 22 f i2 +... + λ 2m f im + ǫ i2.. X ip = µ p + λ p1 f i1 + λ p2 f i2 +... + λ pm f im + ǫ ip X ij is the response of person i (i = 1,...,n) on variable j (j = 1,...,p). µ j is the mean of variable j. f ik is the factor score for person i on factor k (k = 1,...,m) λ jk is the loading of variable j onto factor k. ǫ ij is the random error term for person i and variable j. Lecture #11-11/2/2005 Slide 9 of 58

Common Factor Model As you could probably guess, the Common Factor Model could be more succinctly put by matrices: History Intelligence Assumptions Common Factor Model EFA or CFA? X i = µ + Λ F i + ǫ i (p 1) (p 1) (p m) (m 1) (p 1) Where: X i is the response vector of person i (i = 1,...,n) containing variables j = 1,...,p. µ is the mean vector (containing the means of all variables). F i is the factor score vector for person i, containing factor scores k = 1,...,m. Λ is the factor loading matrix (factor pattern matrix). ǫ i is the random error matrix for person i containing errors for variables j = 1,...,p. Lecture #11-11/2/2005 Slide 10 of 58

Common Factor Model Assumptions Depending upon the assumptions made for the common factor model, two types of factor analyses are defined: History Intelligence Assumptions Common Factor Model EFA or CFA? Exploratory factor analysis (EFA). Confirmatory factor analysis (CFA). EFA seeks to determine: The number of factors that exist. The relationship between each variable and each factor. CFA seeks to: Validate the factor structure presumed by the analysis. Measure the relationship between each factor. CFA, and subsequently, Structural Equation Modeling (SEM) were theorized in Bock in 1960. Lecture #11-11/2/2005 Slide 11 of 58

Assumptions Exploratory factor analysis makes some assumptions that allows for estimation of all factor loadings for each requested factor. Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Given the Common Factor Model: X i = µ + Λ F i + ǫ i (p 1) (p 1) (p m) (m 1) (p 1) The assumptions are: F i and ǫ i are independent. E(F) = 0. Cov(F) = I - key assumption in EFA - uncorrelated factors. E(ǫ) = 0. Cov(ǫ) = Ψ - where Ψ is a diagonal matrix. Lecture #11-11/2/2005 Slide 12 of 58

Implications Due to the model parameterization and assumptions, the Common Factor Model specifies the following covariance structure for the observable data: Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Cov(X) = Σ = ΛΛ + Ψ To illustrate what this looks like: Var(X i ) = σ ii = λ 2 i1 +... + λ2 im + ψ Cov(X i, X k ) = σ ij = λ i1 λ k1 +... + λ im λ km The model-specified covariance matrix, Σ, is something that illustrates the background assumptions of the factor model: that variable intercorrelation is a function of the factors in the model. Lecture #11-11/2/2005 Slide 13 of 58

More Implications Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores The Common Factor Model also specifies that the factor loadings give the covariance between the observable variables and the unobserved factors: Cov(X, F) = Λ Another way of putting this statement is: Cov(X i, F j ) = λ ij Lecture #11-11/2/2005 Slide 14 of 58

EFA Definitions Because of how the EFA model is estimated, a couple of definitions are needed: Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores From two slides ago, we noted that the model predicted variance was defined as: σ }{{} ii = λ 2 i1 +... + λ 2 im }{{} + ψ }{{} Var(X i ) = communality + specific variance The proportion of variance of the i th variable contributed by the m common factors is called the i th communality. h i = λ 2 i1 +... + λ 2 im The proportion of variance of the i th variable due to the specific factor is often called the uniqueness, or specific variance. σ ii = h 2 i + ψ i Lecture #11-11/2/2005 Slide 15 of 58

Model Identification The factor loadings found in EFA estimation are not unique. Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Rather, factor loading matrices (Λ) can be rotated. If T is an orthogonal (orthonormal) matrix (meaning T T = I), then the following give the same factor representation: and Λ = ΛT Such rotations preserves the fit of the model, but allow for easier interpretation of the meanings of the factors by changing the loadings systematically. Λ Lecture #11-11/2/2005 Slide 16 of 58

Model Estimation Methods Because of the long history of factor analysis, many estimation methods have been developed. Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Before the 1950s, the bulk of estimation methods were approximating heuristics - sacrificing accuracy for speedy calculations. Before computers became prominent, many graduate students spent months (if not years) on a single analysis. Today, however, everything is done via computers, and a handful of methods are performed without risk of careless errors. Three estimation methods that we will briefly discuss are: Principal component method. Principal factor method. Maximum likelihood. Lecture #11-11/2/2005 Slide 17 of 58

Example #1 To demonstrate the estimation methods and results from EFA, let s begin with an example. In data from James Sidanius (http://www.ats.ucla.edu/stat/sas/output/factor.htm), the instructor evaluations of 1428 students were obtained for an instructor. Twelve items from the evaluations are used in the data set: Lecture #11-11/2/2005 Slide 18 of 58

Example #1 Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Lecture #11-11/2/2005 Slide 19 of 58

Principal Component Method The principal component method for EFA takes a routine PCA and rescales the eigenvalue weights to be factor loadings. Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Recall that in PCA we created a set of new variables, Y 1,...,Y m, that were called the principal components. These variables had variances equal to the eigenvalues of the covariance matrix, for example Var(Y 1 ) = λ p 1 (where λp 1 represents the largest eigenvalue from a PCA). Now, we must rescale the eigenvalue weights so that they are now factor loadings (which correspond to factors that have unit variances). The estimated factor loadings are computed by: λ jk = λ p k e jk Lecture #11-11/2/2005 Slide 20 of 58

Principal Component Method Example Additionally, the unique variances are found by: ψ i = s ii h 2 i To run an EFA, we use proc factor from SAS. Note that the user guide for proc factor can be found at: http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/chap26/index.htm To run the analysis: libname example C:\11_02\SAS Examples ; proc factor data=example.ex1 nfactors=2; var item13-item24; run; Lecture #11-11/2/2005 Slide 21 of 58

Principal Component Method Example Lecture #11-11/2/2005 Slide 22 of 58

Principal Component Method Example Lecture #11-11/2/2005 Slide 23 of 58

Principal Component Method Example Now, we will compare this result with a result we would obtain from a PCA: proc princomp data=example.ex1 n=2; var item13-item24; run; Notice how λ jk = λ p k e jk. Lecture #11-11/2/2005 Slide 24 of 58

Principal Factor Method An alternative approach to estimating the EFA model is the Principal Factor Method. Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores The Principal Factor Method uses an iterative procedure to arrive at the final solution of estimates. To begin, the procedure picks a set of communality values (h 2 ) and places these values along the diagonal of the correlation matrix R c. The method then iterates between the following two steps until the change in communalities becomes negligible: 1. Using R c, find the Principal Components method estimates of communalities. 2. Replace the communalities in R c with the current estimates. Lecture #11-11/2/2005 Slide 25 of 58

To run the analysis: Principal Factor Method *SAS Example #2; proc factor data=example.ex1 nfactors=2 method=prinit priors=random; var item13-item24; run; Lecture #11-11/2/2005 Slide 26 of 58

Principal Factor Method Example Lecture #11-11/2/2005 Slide 27 of 58

Principal Factor Method Example Lecture #11-11/2/2005 Slide 28 of 58

Principal Factor Method Example Lecture #11-11/2/2005 Slide 29 of 58

Maximum Likelihood Estimation Perhaps the most popular method for obtaining EFA estimates is Maximum Likelihood (ML). Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Similar to the Principal Factor Method, ML proceeds iteratively. The ML method uses a the density function for the normal distribution as the function to optimize (find the parameter estimates that lead to the maximum value). Recall that this was a function of: The data (X). The mean vector (µ). The covariance matrix (Σ). Here, Σ is formed by the model predicted matrix equation: Σ = ΛΛ + Ψ (although some uniqueness conditions are specified). Lecture #11-11/2/2005 Slide 30 of 58

To run the analysis: ML Example *SAS Example #3; proc factor data=example.ex1 nfactors=2 method=ml; var item13-item24; run; Lecture #11-11/2/2005 Slide 31 of 58

Principal Factor Method Example Lecture #11-11/2/2005 Slide 32 of 58

Principal Factor Method Example Lecture #11-11/2/2005 Slide 33 of 58

Principal Factor Method Example Lecture #11-11/2/2005 Slide 34 of 58

Iterative Algorithm Caution With iterative algorithms, sometimes a solution does not exist. Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores When this happens, it is typically caused by what is called a Heywood case - an instance where the unique variance becomes less than or equal to zero (communalities are greater than one). To combat cases like these, SAS will allow you to set all communalities greater than one to one with the heywood option (placed on the proc line). Some would advocate that fixing communalities is not good practice, because Heywood cases indicate more problems with the analysis. To not fix the communalities, SAS allows the statement ultraheywood (placed on the proc line). Lecture #11-11/2/2005 Slide 35 of 58

Estimation Method Comparison If you look back through the output, you will see subtle differences in the solutions of the three methods. Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores What you may discover when fitting the PCA method and the ML method is that the ML method factors sometimes account for less variances than the factors extracted through PCA. This is because of the optimality criterion used for PCA, which attempts to maximize the variance accounted for by each factor. The ML, however, has an optimality criterion that minimizes the differences between predicted and observed covariance matrices, so the extraction will better resemble the observed data. Lecture #11-11/2/2005 Slide 36 of 58

Number of Factors As with PCA, the number of factors to extract can be somewhat arbitrary. Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Often, a scree plot is obtained to check for the number of factors. In SAS, there is a scree plot option: *SAS Example #4; proc factor data=example.ex1 nfactors=2 method=ml scree; var item13-item24; run; Also, with the ML method, SAS prints out a likelihood-ratio test for the number of factors extracted (see slide 33). This test tends to lead to a great number of factors needing to be extracted. Lecture #11-11/2/2005 Slide 37 of 58

Scree Plot Example Lecture #11-11/2/2005 Slide 38 of 58

Factor Rotations As mentioned previously, often times a rotation is made to aid in the interpretation of the extracted factors. Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Orthogonal rotations are given by methods such as: Varimax. Quartimax. Equamax. Oblique (non-orthogonal) rotations allow for greater interpretation by allowing factors to become correlated. Examples are: Proxmax. Procrustes (needs a target to rotate to). Harris-Kaiser. Lecture #11-11/2/2005 Slide 39 of 58

Orthogonal Rotation Example Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores To demonstrate an orthogonal rotation, consider the following code that will produce a varimax transformation of the factor loadings found by ML: *SAS Example #5; proc factor data=example.ex1 nfactors=2 method=ml rotate=varimax; var item13-item24; run; Lecture #11-11/2/2005 Slide 40 of 58

Orthogonal Rotation Example Lecture #11-11/2/2005 Slide 41 of 58

Oblique Rotation Example Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores To demonstrate an oblique rotation, consider the following code that will produce a promax transformation of the factor loadings found by ML: *SAS Example #6; proc factor data=example.ex1 nfactors=2 method=ml rotate=promax; var item13-item24; run; Lecture #11-11/2/2005 Slide 42 of 58

Orthogonal Rotation Example Lecture #11-11/2/2005 Slide 43 of 58

Factor Scores Recall that in PCA, the principal components (Y - the linear combinations) were the focus. Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores In factor analysis, the factor scores are typically an afterthought. In fact, because of the assumptions of the model, direct computation of factor scores by linear combinations of the original data is not possible. Alternatives exist for estimation of factor scores, but even if the data fit the model perfectly, the factor scores obtained will not reproduce the resulting parameters exactly. In SAS, to receive factor scores, place an out=newdata in the proc line, the factor scores will be placed in the newdata data set. Lecture #11-11/2/2005 Slide 44 of 58

Factor Scores Example Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores *SAS Example #7; proc factor data=example.ex1 nfactors=2 method=ml rotate=promax out=example.newdata; var item13-item24; run; Lecture #11-11/2/2005 Slide 45 of 58

CFA Example Rather than trying to determine the number of factors, and subsequently, what the factors mean (as in EFA), if you already know the structure of your data, you can use a confirmatory approach. Confirmatory factor analysis (CFA) is a way to specify which variables load onto which factors. The loadings of all variables not related to a given factor are then set to zero. For a reasonable number of parameters, the factor correlation can be estimated directly from the analysis (rotations are not needed). Lecture #11-11/2/2005 Slide 46 of 58

As an example, consider the data given on p. 502 of Johnson and Wichern: CFA Example Lawley and Maxwell present the sample correlation matrix of examinee scores for six subject areas and 220 male students. The subject tests are: Gaelic. English. History. Arithmetic. Algebra. Geometry. Lecture #11-11/2/2005 Slide 47 of 58

It seems plausible that these subjects should load onto one of two types of ability: verbal and mathematical. CFA Example If we were to specify what the pattern of loadings would look like, the Factor Loading Matrix might look like: λ 11 0 Gaelic λ 21 0 English λ 31 0 History 0 λ 42 Arithmetic Λ = 0 λ 52 Algebra 0 λ 62 Geometry Verbal Math Ability Ability Lecture #11-11/2/2005 Slide 48 of 58

The model-predicted covariance matrix would then be: Σ = ΛΦΛ + Ψ Where: Φ is the factor correlation matrix (here it is size 2 2). Ψ is a diagonal matrix of unique variances. Specifically: Σ = λ 2 11 + ψ 1 λ 11 λ 21 λ 11 λ 31 λ 11 φ 12 λ 42 λ 11 φ 12 λ 52 λ 11 φ 12 λ 62 λ 11 λ 21 λ 2 21 + ψ 2 λ 21 λ 31 λ 21 φ 12 λ 42 λ 21 φ 12 λ 52 λ 21 φ 12 λ 62 λ 11 λ 31 λ 21 λ 31 λ 2 31 + ψ 3 λ 31 φ 12 λ 42 λ 31 φ 12 λ 52 λ 31 φ 12 λ 62 λ 11 φ 12 λ 42 λ 21 φ 12 λ 42 λ 31 φ 12 λ 42 λ 2 42 + ψ 4 λ 42 λ 52 λ 42 λ 62 λ 11 φ 12 λ 52 λ 21 φ 12 λ 52 λ 31 φ 12 λ 52 λ 42 λ 52 λ 2 52 + ψ 5 λ 52 λ 62 λ 11 φ 12 λ 62 λ 21 φ 12 λ 62 λ 31 φ 12 λ 62 λ 42 λ 62 λ 52 λ 62 λ 2 62 + ψ 6 Lecture #11-11/2/2005 Slide 49 of 58

Using an optimization routine (and some type of criterion function, such as ML), the parameter estimates that minimize the function are found. CFA Example To assess the fit of the model, the predicted covariance matrix is subtracted from the observed covariance matrix, and the residuals are summarized into fit statistics. Based on the goodness-of-fit of the model, the result is taken as-is, or modifications are made to the structure. CFA is a measurement model - the factors are measured by the data. SEM is a model for the covariance between the factors. Lecture #11-11/2/2005 Slide 50 of 58

Example CFA Example Lecture #11-11/2/2005 Slide 51 of 58

Example CFA Example Lecture #11-11/2/2005 Slide 52 of 58

Example CFA Example Lecture #11-11/2/2005 Slide 53 of 58

Example Factor Correlation Matrix: CFA Example Lecture #11-11/2/2005 Slide 54 of 58

Example Factor Loading Matrix: CFA Example Lecture #11-11/2/2005 Slide 55 of 58

Example Uniqueness Matrix: CFA Example Lecture #11-11/2/2005 Slide 56 of 58

Final Thought Final Thought Next Class EFA shares many features with PCA, but is primarily used to determine the intercorrelation of variables rather than to develop new linear combinations of variables. CFA is concerned with assessing the plausibility of a structural model for observed data. We have only scratched the surface of topics about EFA and CFA. This class only introduced concepts in factor analysis. For a more thorough treatment of the subject (and other measurement topics), consider taking my Psychology 892 (Measurement Methods) course next semester. Lecture #11-11/2/2005 Slide 57 of 58

Next Time Midterm discussion. Canonical Correlation. More examples. Final Thought Next Class Lecture #11-11/2/2005 Slide 58 of 58