Introduction to Factor Analysis

to Factor Analysis Lecture 10 August 2, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #10-8/3/2011 Slide 1 of 55

Today s Lecture Factor Analysis Today s Lecture Exploratory factor analysis (EFA) Confirmatory factor analysis (CFA) How to do an EFA or CFA Comparisons to PCA Lecture #10-8/3/2011 Slide 2 of 55

History of Factor Analysis Spearman s seminal 1904 paper has shaped psychology as a science Intelligence Assumptions Common Factor Model EFA or CFA? Spearman, C. (1904). General intelligence objectively determined and measured. American Journal of Psychology, 15, 201-293. Notable quotations from this paper: Most of those hostile to Experimental Psychology are in the habit of reproaching its methods with insignificance, and even with triviality...they protest that such means can never shed any real light upon the human soul, unlock the eternal antinomy of Free Will, or reveal the inward nature of Time and Space. (p. 203) The present article, therefore, advocates a Correlational Psychology. (p. 205) Lecture #10-8/3/2011 Slide 3 of 55

Measurement of Intelligence The idea that Spearman was pursuing with his work was a way to pin down intelligence Intelligence Assumptions Common Factor Model EFA or CFA? At the time, Psychologists had thought that intelligence could be defined by a single, all-encompassing unobservable entity, called g (for general intelligence) In his paper, Spearman sought to describe the influence of g on examinee s test scores on several domains: Pitch Light Weight Classics French English Mathematics In reality, g may or may not exist, but postulating g provides a mechanism to detect common correlations among such variables Lecture #10-8/3/2011 Slide 4 of 55

Measurement of Intelligence The model proposed by Spearman was very similar to a linear regression model: Intelligence Assumptions Common Factor Model EFA or CFA? X i1 = µ 1 + λ 1 g i + ǫ i1 X i2 = µ 2 + λ 2 g i + ǫ i2.. X ip = µ p + λ p g i + ǫ ip Here: X ij is the score of examinee i (i = 1,...,n) on test domain j (j = 1,...,p) µ j is the mean of test domain j g i is the value of the intelligence factor for person i λ j is the loading of test domain j onto the general ability factor g ǫ ij is the random error term for person i and test domain j Lecture #10-8/3/2011 Slide 5 of 55

Spearman Model Assumptions The Spearman factor model has the following assumptions: Intelligence Assumptions Common Factor Model EFA or CFA? E(g) = µ g = 0 Var(g) = 1 E(ǫ ip ) = µ ǫip = 0 for all i and p Var(ǫ) = φ Although the Spearman model is very similar to a regression model (just replace g with an observed variable), estimation of the model cannot proceed like regression because g is not observed (we will get to model estimation shortly) Note that the Spearman model is still very much in effect today with it being the basis for Item Response Theory (IRT) - methods used to estimate ability from test scores (and how you received your score on the GRE) Lecture #10-8/3/2011 Slide 6 of 55

Common Factor Model As theories of intelligence began to change from generalized ability to specialized ability, the Spearman model with a single latent construct became less popular Intelligence Assumptions Common Factor Model EFA or CFA? In the 1930 s, L. L. Thurstone developed the common factor model The common factor model posited that scores were a function of multiple latent variables, variables that represented more specialized abilities Lecture #10-8/3/2011 Slide 7 of 55

Common Factor Model The Common Factor Model was also very similar to a linear multiple regression model: Intelligence Assumptions Common Factor Model EFA or CFA? Where: X i1 = µ 1 + λ 11 f i1 + λ 12 f i2 +... + λ 1m f im + ǫ i1 X i2 = µ 2 + λ 21 f i1 + λ 22 f i2 +... + λ 2m f im + ǫ i2.. X ip = µ p + λ p1 f i1 + λ p2 f i2 +... + λ pm f im + ǫ ip X ij is the response of person i (i = 1,...,n) on variable j (j = 1,...,p) µ j is the mean of variable j f ik is the factor score for person i on factor k (k = 1,...,m) λ jk is the loading of variable j onto factor k ǫ ij is the random error term for person i and variable j Lecture #10-8/3/2011 Slide 8 of 55

Common Factor Model As you could probably guess, the Common Factor Model could be more succinctly put by matrices: Intelligence Assumptions Common Factor Model EFA or CFA? X i = µ + Λ F i + ǫ i (p 1) (p 1) (p m) (m 1) (p 1) Where: X i is the response vector of person i (i = 1,...,n) containing variables j = 1,...,p µ is the mean vector (containing the means of all variables) F i is the factor score vector for person i, containing factor scores k = 1,...,m Λ is the factor loading matrix (factor pattern matrix) ǫ i is the random error matrix for person i containing errors for variables j = 1,...,p Lecture #10-8/3/2011 Slide 9 of 55

Common Factor Model Assumptions Depending upon the assumptions made for the common factor model, two types of factor analyses are defined: Intelligence Assumptions Common Factor Model EFA or CFA? Exploratory factor analysis (EFA) Confirmatory factor analysis (CFA) EFA seeks to determine: The number of factors that exist The relationship between each variable and each factor CFA seeks to: Confirm or reject the factor structure presumed by the analysis Measure the relationship between each factor CFA, and subsequently, Structural Equation Modeling (SEM) were extensions of the framework Lecture #10-8/3/2011 Slide 10 of 55

Assumptions Exploratory factor analysis makes some assumptions that allows for estimation of all factor loadings for each requested factor Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Given the Common Factor Model: X i = µ + Λ F i + ǫ i (p 1) (p 1) (p m) (m 1) (p 1) The assumptions are: F i and ǫ i are independent E(F) = 0 Cov(F) = I - key assumption in today s EFA lecture - uncorrelated factors E(ǫ) = 0 Cov(ǫ) = Ψ - where Ψ is a diagonal matrix Lecture #10-8/3/2011 Slide 11 of 55

Implications Due to the model parameterization and assumptions, the Common Factor Model specifies the following covariance structure for the observable data: Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Cov(X) = Σ = ΛΛ + Ψ To illustrate what this looks like: Var(X i ) = σ ii = λ 2 i1 +... + λ2 im + ψ Cov(X i, X k ) = σ ij = λ i1 λ k1 +... + λ im λ km The model-specified covariance matrix, Σ, is something that illustrates the background assumptions of the factor model: Variable correlations are a function of the factors in the model Lecture #10-8/3/2011 Slide 12 of 55

More Implications Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores The Common Factor Model also specifies that the factor loadings give the covariance between the observable variables and the unobserved factors: Cov(X, F) = Λ Another way of putting this statement is: Cov(X i, F j ) = λ ij Lecture #10-8/3/2011 Slide 13 of 55

EFA Definitions Because of how the EFA model is estimated, a couple of definitions are needed: Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores From two slides ago, we noted that the model predicted variance was defined as: σ }{{} ii = λ 2 i1 +... + λ 2 im }{{} + ψ }{{} Var(X i ) = communality + specific variance The proportion of variance of the i th variable contributed by the m common factors is called the i th communality h i = λ 2 i1 +... + λ 2 im The proportion of variance of the i th variable due to the specific factor is often called the uniqueness, or specific variance σ ii = h 2 i + ψ i Lecture #10-8/3/2011 Slide 14 of 55

Model Identification The factor loadings found in EFA estimation are not unique Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Rather, factor loading matrices (Λ) can be rotated If T is an orthogonal (orthonormal) matrix (meaning T T = I), then the following give the same factor representation: and Λ = ΛT Such rotations preserves the fit of the model, but allow for easier interpretation of the meanings of the factors by changing the loadings systematically Λ Lecture #10-8/3/2011 Slide 15 of 55

Model Estimation Methods Because of the long history of factor analysis, many estimation methods have been developed Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Before the 1950s, the bulk of estimation methods were approximating heuristics - sacrificing accuracy for speedy calculations Before computers became prominent, many graduate students spent months (if not years) on a single analysis Today, however, everything is done via computers, and a handful of methods are performed without risk of careless errors Three estimation methods that we will briefly discuss are: Principal component method. Principal factor method. Maximum likelihood. Lecture #10-8/3/2011 Slide 16 of 55

Example #1 To demonstrate the estimation methods and results from EFA, let s begin with an example In data from James Sidanius (http://www.ats.ucla.edu/stat/sas/output/factor.htm), the instructor evaluations of 1428 students were obtained for an instructor Twelve items from the evaluations are used in the data set: Lecture #10-8/3/2011 Slide 17 of 55

Principal Component Method The principal component method for EFA takes a routine PCA and rescales the eigenvalue weights to be factor loadings Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Recall that in PCA we created a set of new variables, Y 1,...,Y m, that were called the principal components These variables had variances equal to the eigenvalues of the covariance matrix, for example Var(Y 1 ) = λ p 1 (where λp 1 represents the largest eigenvalue from a PCA) Now, we must rescale the eigenvalue weights so that they are now factor loadings (which correspond to factors that have unit variances) The estimated factor loadings are computed by: λ jk = λ p k e jk Lecture #10-8/3/2011 Slide 18 of 55

Principal Component Method Example Additionally, the unique variances are found by: ψ i = s ii h 2 i To run an EFA, we use proc factor from SAS Note that the user guide for proc factor can be found at: http://support.sas.com/documentation/cdl/en/statug/63347/html/default/viewer.ht To run the analysis: proc factor data=example.ex1 nfactors=2; var item13-item24; run; Lecture #10-8/3/2011 Slide 19 of 55

Principal Component Method Example Lecture #10-8/3/2011 Slide 20 of 55

Principal Component Method Example Lecture #10-8/3/2011 Slide 21 of 55

Principal Component Method Example Now, we will compare this result with a result we would obtain from a PCA: proc princomp data=example.ex1 n=2; var item13-item24; run; Notice how λ jk = λ p k e jk. Lecture #10-8/3/2011 Slide 22 of 55

Principal Factor Method An alternative approach to estimating the EFA model is the Principal Factor Method Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores The Principal Factor Method uses an iterative procedure to arrive at the final solution of estimates To begin, the procedure picks a set of communality values (h 2 ) and places these values along the diagonal of the correlation matrix R c The method then iterates between the following two steps until the change in communalities becomes negligible: 1. Using R c, find the Principal Components method estimates of communalities 2. Replace the communalities in R c with the current estimates Lecture #10-8/3/2011 Slide 23 of 55

To run the analysis: Principal Factor Method *SAS Example #2; proc factor data=example.ex1 nfactors=2 method=prinit priors=random; var item13-item24; run; Lecture #10-8/3/2011 Slide 24 of 55

Principal Factor Method Example Lecture #10-8/3/2011 Slide 25 of 55

Principal Factor Method Example Lecture #10-8/3/2011 Slide 26 of 55

Principal Factor Method Example Lecture #10-8/3/2011 Slide 27 of 55

Maximum Likelihood Estimation Perhaps the most popular method for obtaining EFA estimates is Maximum Likelihood (ML) Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Similar to the Principal Factor Method, ML proceeds iteratively The ML method uses a the density function for the normal distribution as the function to optimize (find the parameter estimates that lead to the maximum value) Recall that this was a function of: The data (X) The mean vector (µ) The covariance matrix (Σ) Here, Σ is formed by the model predicted matrix equation: Σ = ΛΛ + Ψ (although some uniqueness conditions are specified). Lecture #10-8/3/2011 Slide 28 of 55

To run the analysis: ML Example *SAS Example #3; proc factor data=example.ex1 nfactors=2 method=ml; var item13-item24; run; Lecture #10-8/3/2011 Slide 29 of 55

Principal Factor Method Example Lecture #10-8/3/2011 Slide 30 of 55

Principal Factor Method Example Lecture #10-8/3/2011 Slide 31 of 55

Principal Factor Method Example Lecture #10-8/3/2011 Slide 32 of 55

Iterative Algorithm Caution With iterative algorithms, sometimes a solution does not exist Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores When this happens, it is typically caused by what is called a Heywood case - an instance where the unique variance becomes less than or equal to zero (communalities are greater than one) To combat cases like these, SAS will allow you to set all communalities greater than one to one with the heywood option (placed on the proc line) Some would advocate that fixing communalities is not good practice, because Heywood cases indicate more problems with the analysis To not fix the communalities, SAS allows the statement ultraheywood (placed on the proc line) Lecture #10-8/3/2011 Slide 33 of 55

Estimation Method Comparison If you look back through the output, you will see subtle differences in the solutions of the three methods Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores What you may discover when fitting the PCA method and the ML method is that the ML method factors sometimes account for less variances than the factors extracted through PCA This is because of the optimality criterion used for PCA, which attempts to maximize the variance accounted for by each factor The ML, however, has an optimality criterion that minimizes the differences between predicted and observed covariance matrices, so the extraction will better resemble the observed data Lecture #10-8/3/2011 Slide 34 of 55

Number of Factors As with PCA, the number of factors to extract can be somewhat arbitrary Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Often, a scree plot is obtained to check for the number of factors In SAS, there is a scree plot option: *SAS Example #4; proc factor data=example.ex1 nfactors=2 method=ml scree; var item13-item24; run; Also, with the ML method, SAS prints out a likelihood-ratio test for the number of factors extracted (see slide 33) This test tends to lead to a great number of factors needing to be extracted Lecture #10-8/3/2011 Slide 35 of 55

Scree Plot Example Lecture #10-8/3/2011 Slide 36 of 55

Factor Rotations As mentioned previously, often times a rotation is made to aid in the interpretation of the extracted factors Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores Orthogonal rotations are given by methods such as: Varimax Quartimax Equamax Oblique (non-orthogonal) rotations allow for greater interpretation by allowing factors to become correlated Examples are: Proxmax Procrustes (needs a target to rotate to) Harris-Kaiser Lecture #10-8/3/2011 Slide 37 of 55

Orthogonal Rotation Example Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores To demonstrate an orthogonal rotation, consider the following code that will produce a varimax transformation of the factor loadings found by ML: *SAS Example #5; proc factor data=example.ex1 nfactors=2 method=ml rotate=varimax; var item13-item24; run; Lecture #10-8/3/2011 Slide 38 of 55

Orthogonal Rotation Example Lecture #10-8/3/2011 Slide 39 of 55

Oblique Rotation Example Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores To demonstrate an oblique rotation, consider the following code that will produce a promax transformation of the factor loadings found by ML: *SAS Example #6; proc factor data=example.ex1 nfactors=2 method=ml rotate=promax; var item13-item24; run; Lecture #10-8/3/2011 Slide 40 of 55

Orthogonal Rotation Example Lecture #10-8/3/2011 Slide 41 of 55

Factor Scores Recall that in PCA, the principal components (Y - the linear combinations) were the focus Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores In factor analysis, the factor scores are typically an afterthought In fact, because of the assumptions of the model, direct computation of factor scores by linear combinations of the original data is not possible Alternatives exist for estimation of factor scores, but even if the data fit the model perfectly, the factor scores obtained will not reproduce the resulting parameters exactly In SAS, to receive factor scores, place an out=newdata in the proc line, the factor scores will be placed in the newdata data set Lecture #10-8/3/2011 Slide 42 of 55

Factor Scores Example Model Implications EFA Definitions Identification Estimation Example #1 Principal Component Method Principal Factor Method Maximum Likelihood Caution Method Comparison Number of Factors Factor Rotations Orthogonal Rotation Oblique Rotation Factor Scores *SAS Example #7; proc factor data=example.ex1 nfactors=2 method=ml rotate=promax out=example.newdata; var item13-item24; run; Lecture #10-8/3/2011 Slide 43 of 55

CFA Example Rather than trying to determine the number of factors, and subsequently, what the factors mean (as in EFA), if you already know the structure of your data, you can use a confirmatory approach Confirmatory factor analysis (CFA) is a way to specify which variables load onto which factors The loadings of all variables not related to a given factor are then set to zero For a reasonable number of parameters, the factor correlation can be estimated directly from the analysis (rotations are not needed) Lecture #10-8/3/2011 Slide 44 of 55

As an example, consider the data given on p. 502 of Johnson and Wichern: CFA Example Lawley and Maxwell present the sample correlation matrix of examinee scores for six subject areas and 220 male students The subject tests are: Gaelic English History Arithmetic Algebra Geometry Lecture #10-8/3/2011 Slide 45 of 55

It seems plausible that these subjects should load onto one of two types of ability: verbal and mathematical CFA Example If we were to specify what the pattern of loadings would look like, the Factor Loading Matrix might look like: λ 11 0 Gaelic λ 21 0 English λ 31 0 History 0 λ 42 Arithmetic Λ = 0 λ 52 Algebra 0 λ 62 Geometry Verbal Math Ability Ability Lecture #10-8/3/2011 Slide 46 of 55

The model-predicted covariance matrix would then be: Σ = ΛΦΛ + Ψ Where: Φ is the factor correlation matrix (here it is size 2 2) Ψ is a diagonal matrix of unique variances Specifically: Σ = λ 2 11 + ψ 1 λ 11 λ 21 λ 11 λ 31 λ 11 φ 12 λ 42 λ 11 φ 12 λ 52 λ 11 φ 12 λ 62 λ 11 λ 21 λ 2 21 + ψ 2 λ 21 λ 31 λ 21 φ 12 λ 42 λ 21 φ 12 λ 52 λ 21 φ 12 λ 62 λ 11 λ 31 λ 21 λ 31 λ 2 31 + ψ 3 λ 31 φ 12 λ 42 λ 31 φ 12 λ 52 λ 31 φ 12 λ 62 λ 11 φ 12 λ 42 λ 21 φ 12 λ 42 λ 31 φ 12 λ 42 λ 2 42 + ψ 4 λ 42 λ 52 λ 42 λ 62 λ 11 φ 12 λ 52 λ 21 φ 12 λ 52 λ 31 φ 12 λ 52 λ 42 λ 52 λ 2 52 + ψ 5 λ 52 λ 62 λ 11 φ 12 λ 62 λ 21 φ 12 λ 62 λ 31 φ 12 λ 62 λ 42 λ 62 λ 52 λ 62 λ 2 62 + ψ 6 Lecture #10-8/3/2011 Slide 47 of 55

Using an optimization routine (and some type of criterion function, such as ML), the parameter estimates that minimize the function are found CFA Example To assess the fit of the model, the predicted covariance matrix is subtracted from the observed covariance matrix, and the residuals are summarized into fit statistics Based on the goodness-of-fit of the model, the result is taken as-is, or modifications are made to the structure CFA is a measurement model - the factors are measured by the data SEM is a model for the covariance between the factors Lecture #10-8/3/2011 Slide 48 of 55

Example CFA Example Lecture #10-8/3/2011 Slide 49 of 55

Example CFA Example Lecture #10-8/3/2011 Slide 50 of 55

Example CFA Example Lecture #10-8/3/2011 Slide 51 of 55

Example Factor Correlation Matrix: CFA Example Lecture #10-8/3/2011 Slide 52 of 55

Example Factor Loading Matrix: CFA Example Lecture #10-8/3/2011 Slide 53 of 55

Example Uniqueness Matrix: CFA Example Lecture #10-8/3/2011 Slide 54 of 55

Final Thought EFA shares many features with PCA, but is primarily used to determine the intercorrelation of variables rather than to develop new linear combinations of variables Final Thought CFA is concerned with assessing the plausibility of a structural model for observed data We have only scratched the surface of topics about EFA and CFA Up Next: Canonical correlation Lecture #10-8/3/2011 Slide 55 of 55