LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

Size: px

Start display at page:

Download "LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS"

Martin Dorsey
5 years ago
Views:

1 LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal is to collapse 20 items into more meaningful variables to explain the data. Maybe adding up the variables is not the optimal way to combine them. Large component values for each item - the items are important in explaining the data. We can interpret the component and give it a name. Component 2 explains another dimension to the data. Straight line that explains most of the data when you rotate the graphical representation - this is the first principal component. We want to find the weights for the variables of the first principal component - this has maximum variance, and the variance of this is the first eigenvalue. The second principal component maximises the explanation of the remaining variance, and has zero correlation with the first component - independent explanation of the variance. PCA aims to weight the different variables to account for more variation in the data. The principal components that you extract involve an additive sum of the weights of individual variables in order to explain more variation and not need 20 distinct variables. Once you have extracted components, you can rotate again to simplify the structure and aid interpretation of the components. The rotation enables you to remove components with smaller loadings, so you can ignore these and aid interpretation. NOTES FROM PRE- LECTURE RECORDING ON EFA Goal is similar to PCA but it is different. Involves an assumption of an underlying factor that causes the correlations among the variables. PCA is a mathematical matrix technique, and EFA is a method that assumes some underlying explanatory model in the data, such as intelligence or self esteem. We are searching for this underlying factor, that explains lots of different observations of intelligence (lots of different variables). In EFA, the observed variables are derived from the common factors. Common factors are similar to principal components, but common factors explain the observed variables - as these come from the overarching factor. Diagram - you have four intelligence tests (X1 X2 X3 X4), that measure the overarching factor, of intelligence F. Diagram describes the common factor model. The overarching factor F explains the observed scores on each X test, to the extent of the lamba term for each X. You would expect the tests to be correlated, but they are not correlated because

2 they are all explained by the common factor. Once you take the common factor into account, there are no relations among the tests, but each test has its own explanatory factor. Underlying rationale - partial correlations among variables. Can analyse whether correlations between two items is fully explained by an overarching variable. The goal of factor analysis is to find this latent variable which will account for observed correlations and render the correlation between items, when taking into account the latent variable, zero. The latent factor accounts for the observe correlation. The aim is to find a latent or unobserved variable, which when correlated with our observed variables, leads to partial correlations between the observed variables that are as close to zero as we can get. The communality is the variance due to common factors - the variance among each variable that is due to the overarching common factor. Then what is left over from the communality is the residual variance that comes from the individual variable itself. How to find these latent factors? There are various types of exploratory factor analysis. Recommended - maximum likelihood and principal axis factoring. - Maximum likelihood is good as it has a test of fit. If significant, it means it does not fit the data. We do not want to reject the null hypothesis. You want p values above.05 for this. The problem with this test is it is very sensitive to sample size, so can pick up trivial mis- specifications with larger samples too easily. - Various ways to take this into account: RMSEA. If it is less than 0.05, it is a good fit. You would say, based on the calculation, that this model is a good fit, based on the RMSEA calculation. Then once we have identified the factors, we can estimate factor scores for each case. One option for estimating factor scores is to form sum of scores for items that load on a factor. This assumes equal weights for each item (tau- equivalent test). If this is not true, alpha a is a serious underestimate. Another option - assumption of varying factor loadings. These are known as congeneric tests. What was described for principal components analysis - finding the weights for each variable. With unequal loadings - this is regression method, bartlett method or anderson rubin. Here, you want to create new variables / factors based on your multiple scores. Save the variables and run the analysis on the factor, not the separate tests. This is because the new factor combines the weights of the individual tests based on an assumed underlying latent variable. NOTES FROM LECTURE ASSOCIATIONS AMONG VARIABLES Say you measure people on a 20 item scale this would give you 20 variables. However, as this is too many to use practically in research, we can use the

o Multidimensional scaling shows relationships among variables by spatial configuration. o Principal component analysis combines variables into weighted sums (components) to simplify the analysis.

3 associations among the variables to condense these into a smaller number. There are various techniques for associating variables o Cluster analysis placed variables in groups organised hierarchically, where each variable is in one and only one cluster. o Multidimensional scaling shows relationships among variables by spatial configuration. o Principal component analysis combines variables into weighted sums (components) to simplify the analysis. o Factor analysis shows relationships among variables by relationships to hypothetical underlying factors or latent variables. Principal component and factor analysis both use correlations as their underlying association among variables, whereas cluster analysis and MDS use proximities. These are similar methods, but are distinct. Why bother? o If you have 30 variables, then it makes sense to simplify them by combining them into a small number of components or factors. o PCA does this by finding optimal linear transformations. This is a mathematical technique that uses observed data to create principal components. o EFA assumes that there are latent factors that are not directly observed an explain the associations among the variables. This aims to find latent, unobserved factors that account for variation in the variables. PRINCIPAL COMPONENTS ANALYSIS The basic idea of principal components is graphical. The goal is to find new axes to fit your data these will represent the new components. If you have n variables, you are working in n dimensional space. Perhaps there are new axes that will more simply model the data. Perhaps the data can be described just as well without using the full n components. To make these new axes useful, you need to choose new components one by one in order to explain the most variance. Example three variables graphed in three- dimensional space (see above). Notice that this appears to be one straight line that cuts through the data. If you rotate the axes, you can visualize this straight line that appears to explain the data very well. Red line this is the first principal component, as it accounts for the most variation

in the data. You can find this line, and then use this as one variable, to therefore replace three variables with one and still account for a large proportion of the variation in the data.

4 in the data. You can find this line, and then use this as one variable, to therefore replace three variables with one and still account for a large proportion of the variation in the data. There may be more than one line that explains the data (more than one principal component). But you hope to keep the principal components low, such that you have minimal variables to work with and still explain a large proportion of variation in the data. ALGEBRAIC INTERPRETATION The sum of the a coefficients is not important, it is just a technical constraint that enables you to find the weights for the variables that make up the principal components. It is important that the first principal component you extract has maximum variance explained this is the first eigenvalue. If necessary, you can extract a second principal component, which explains the maximum proportion of the remaining variance (2 nd eigenvalue), and has ZERO correlation with the first principal component (is at right angles to this line). You can continue to do this until you have p principal components, if you have p variables. This means you are simply rotating your axes in order to explain the variation in the data with new axes. But if the majority of the variance is explained with 2-3 principal components, you would stop after these. Why is this simpler? o When they are first extracted, the principal components are not correlated with each other, even if the original variables are (we need them to be). o Using this technique, we can also understand which component is the most important the first component explains the largest proportion of the variance, the second component explains the second largest, and so on. Eigenvalues are derived from the correlation matrix of variables an eigendecomposition of the matrix. The correlation matrix is used, rather than the covariance matrix, so the variables are standardised before analysis. This is a mathematical technique, not a statistical technique. As there are no error terms here, this is not statistics. It is standard manipulation of the correlation matrix of the variables, to derive principal components. Principal components analysis is especially useful when you have a large number of variables and wish to explain the same amount of variance in the observed data using fewer variables. EXAMPLE 20 item teamwork scale for a training group. Participants were asked these items on a likert scale.

5 On SPSS Analyze > Dimension Reduction > Factor. Enter the variables you wish to derive principal components from. Extraction the default method is principal components. If you run the analysis with default settings, you get the following output: This shows you the 20 components. Initial eigenvalues column shows you the variance explained by each component. The column on the right highlights the cumulative % of the variance explained by each of the components, if all 20 are extracted. Do not use the default option used by SPSS of selecting 4 components this is wrong. We want to know when to decide that we have captured adequate signal, and the rest is noise. HOW MANY COMPONENTS SHOULD YOU EXTRACT? 1. Kaiser s rule This is the default option used by SPSS but is usually wrong. Number of factors = number of eigenvalues > 1.0. Known as the Kaiser- Guttman rule. This rule intuitively means that any retained factor should account for at least as much variation as any of the original variables. Each observed variable contributes one unit of variance to the total variance. If the eigenvalue is greater than 1, then each principal component explains at least as much variance as 1 observed variable. It tends to choose one third of the variables as the number of components. 2. The Scree Plot Good for graphically conceptualizing the idea of signal and noise. Scree is the rubble at the bottom of the cliff. The Scree Plot graphs the number of components (x) against the eigenvalues (y). At some point the plot turns, and this is the point at which the eigenvalues for your components diminish, and at this point is where you should stop extracting components. Obtain the Scree Plot under the extraction option in SPSS. From these different tests, you will obtain a number of different criteria for selecting the different

components, and you need to make an assessment about how many to extract. 3. The Parallel Test This test enables you to move beyond a superficial examination of the Scree plot.

6 components, and you need to make an assessment about how many to extract. 3. The Parallel Test This test enables you to move beyond a superficial examination of the Scree plot. This test generates random data of the same dimensions this is definitely noise. You can then look at the eigenvalues for your dataset compared to the random data, and if they are larger, then this represents signal in direct comparison to noise. Where the Scree Plot (red line) is above the 95 th percentile (blue line) you can be sure this is significant signal. This is beneficial as it removes the subjectivity of simply looking at the Scree plot, and gives you a specific decision point. It is not in SPSS, but you can use a syntax script to run the parallel test. Syntax for the parallel test ensure you have the correct variables entered. From this, you get the following output:

7 From this, you extract components where your data (raw data column) is greater than the 95 th percentile of the random data (Pctile column). 2 is the last component where the eigenvalue for our observed data is greater than the random noise, so you select 2 components. 4. The MAP Test Velicer (1976) devised a test based on partial correlations known as the Minimum Average Partial Correlation test. After each component is extracted, it and those extracted before it are partialled out of the correlation matrix of original variables, and the average of the resulting partial correlations calculated. As more components are partialled out, the resulting partial correlations would approach zero. But at some point components that reflected noise would be partialled out, and the average partial correlation would begin to rise. Based on this, you choose the number of components corresponding to the minimum average partial correlation. You use a syntax script for the MAP test as well. This produces the following output:

8 You can see from the middle column that the average partial correlation decreases, and then increases once the noise has been partialled out. Choose the number of components at the minimum point of this value. Here, the map test and the parallel test agree on 2 principal components, so you can have confidence that 2 is a good number. If there was a discrepancy make a decision. Here you can run both and see if you gain differing interpretations, or refer to Schmitt (2011) and say that the parallel test is more accurate and go with that, or see what is the most interpretable option etc. Syntax for Parallel and MAP tests is on LMS. COMPONENT / FACTOR LOADINGS Now that we know the number of factors to be extracted is 2, you can specify this under extraction on SPSS. o This gives you a component matrix, with various different loadings for the different variables within each component. o Can see the weights by component each item with a greater weight is more important than those with a lesser weight, in the given component. o Signs of the weights this is all standardised, so if you get a high score on a positive weighted item, then you get a higher score on component 1. If you get a higher score on a negatively weighted item, your score on component 1 decreases.

This is still fairly complicated with loadings for all 20 items on the individual components. You want to work out what component 1 actually means.

9 This is still fairly complicated with loadings for all 20 items on the individual components. You want to work out what component 1 actually means. We need another step to make this simpler rotate again. COMPONENT ROTATION: SIMPLIFYING INTERPRETATION Once you have extracted a number of components that is less than your total number of variables, you can rotate this again in the remaining dimensions and gain an even simpler interpretation. We rotate again so that the components are easier to interpret. Allowed with large loadings on some variables, and small loadings on others. On SPSS you can do this under the rotations option. Varimax is an orthogonal rotation method keeps the axes at right angles and does not allow correlation between the components. Some rotation methods allow correlation, others don t. This provides a rotated component matrix in the output increases the larger loadings, and decreases the smaller loadings to aid interpretation. To aid this further, you can ask SPSS to remove small coefficients. o These are still included in the analysis, they are just not reported in the output. Removal of coefficients < 0.3 is generally acceptable. o Now that you are at this point, you can interpret the component with reference to the original variables. COMPONENT INTERPRETATION You look at the loadings greater than 0.6, with reference to the original items in the question, and derive what the commonality among all the items might be which represents the component variable. This is a unique way of adding up the different variables within each component, with derived weights that aid interpretation of the data. TYPES OF ROTATION Orthogonal rotation components / factors DO NOT correlate. Components are at right angles to one another. With an orthogonal rotation, loadings are equivalent to correlations between observed variables and components.

10 Oblique rotation components / factors DO correlate. Components converge on one another in the graph. With oblique rotations you always get a correlation matrix in SPSS as well which tells you the extent to which your factors correlate and a pattern matrix and structure matrix. Focus on the PATTERN MATRIX. The structure matrix is a product of the pattern matrix and the factor correlation matrix. But you use the pattern matrix for interpretation. Schmitt (2011) strongly recommends oblique rotation methods as in practice components usually correlate. In practice try both rotation methods. If the correlation between the components is quite small, revert back to orthogonal rotations. EXPLORATORY FACTOR ANALYSIS This is much more important in psychological research than principal components analysis, which is a much older technique. This is because in psychological research, much of what we deal with is latent constructs that we assume explain our observations. o Eg intelligence. There are many different measures for intelligence, and we assume this latent variable explains observations on intelligence tests. Factor analysis is a technique to search for such underlying factors a method related to measurement. Factor analysis assumes these latent constructs exist you need a theoretical argument to justify what construct might exist. Particularly useful when you want to estimate underlying factors or constructs that cause the associations among your variables, which cannot be measured directly. THE COMMON FACTOR MODEL We have the common factor, which each individual measure relates to, and the specific factor for each measure, which explains variation from the common factor (the error term).

11 We have observed variables, and assume there are k common factors that explain the observations on these variables. Each variable represents the product of the common factor, the degree to which the variable relates to the common factor, with the error added in. Assumptions of this EXPLORATORY factor analysis (these differ for confirmatory factor analysis): o Common factors are standardised (variance = 1) o Common factors are uncorrelated o Specific factors are uncorrelated o Common factors are uncorrelated with specific factors. The common factor model uses structural equation modeling interpretations. We have an unseen common factor (eg neuroticism) at the top. The four Xs are the different observations or measures (eg different neuroticism tests, or different items on a neuroticism test). Each observation is related to the common factor to the value of each lambda coefficient. Different lambda coefficients indicate differing degrees of prediction of the common factor. u coefficients represent the variation in the observations that is not explained by the common factor (error terms). Greater error terms mean smaller lambda coefficients. UNDERLYING RATIONALE: PARTIAL CORRELATIONS This technique uses the correlations between items, given the influence of the explanatory factor. Suppose a correlation of between items on an extraversion scale: o 1 Don t mind being the centre of attention.

o 2 Feel comfortable around people. o Correlation of 0.82 between item 1 and extraversion. o Correlation of 0.75 between item 2 and extraversion. r 12.

12 o 2 Feel comfortable around people. o Correlation of 0.82 between item 1 and extraversion. o Correlation of 0.75 between item 2 and extraversion. r 12.E is the partial correlation between items 1 and 2 controlling for Extraversion. r 12 is the correlation between items 1 and 2. r 1E is the correlation between item 1 and Extraversion. r 2E is the correlation between item 2 and Extraversion. The above calculation tells you that the correlation between items 1 and 2, given extraversion (controlling for this), is zero their correlation is FULLY EXPLAINED by extraversion. The goal of factor analysis therefore is to find a latent variable which will account for observed correlation find the point at which the partial correlations are zero. We will try and find a model (which PCA does not do, this is simple mathematics), which best captures the covariance matrix, and renders the partial correlations zero. The aim is to find a latent or unobserved variable, which when correlated with our observed variables, leads to partial correlations between the observed variables that are as close to zero as possible. o The computer algorithm estimates the lambda coefficients to the point at which partial correlations are near as possible to zero. If each variable (Xs in diagram) is to vary, it can only vary by the extent to which it relates to the latent factor (lambda), or the specific factor (u term). This is conceptualised below:

13 We want high communality this is the variation due to the common factor (sum of the squared lambda terms). FUNDAMENTAL EQUATION OF FACTOR ANALYSIS Left term is the covariance matrix of all observed variables. We are trying to find an optimal model to separate this into the factor loading variance (left term), and the specific factor variance (right term). We are trying to model the covariance matrix here. PERFORMING THE FACTOR ANALYSIS IN SPSS Instead of principal components analysis under extraction tab, choose any other option usually maximum likelihood, or general least squares exploratory factor analysis options. There are issues here with whether or not the covariance matrix is readily factorisable. PCA always works because it is simple mathematics. As EFA is a statistical model, the conditions of the data need to be right for it to function correctly. TECHNICAL ISSUES WITH RUNNING EFA Sample size research has attempted to provide rules of thumb for sample size to ensure beneficial results. o The results of a key paper (Guadagnoli and Velicer 1987) illustrated that absolute sample size and communalities (size of factor loadings) are the most important thing. o If there are conditions where the component loadings are high (above 0.60), and there were four or more variables per component, the sample size could be as low as 50. Generally though, you would want around 150 participants for this to work.

14 Costello & Osborne (2005) state the ideally EFA should have: o High communalities for each item (> 0.8 would be excellent, but is more common). If an item has communality lower than 0.4, it could be removed, as it doesn t fit with other items. But don t remove items too liberally in the assignment if they have low communalities. o Few cross loadings not many items that load more than around 0.32 on more than one factor. Eg Big 5 personality traits these do not correlate with one another. In reality, you will have items that cross load. You can tolerate some cross- loadings, but you wouldn t have them in an ideal dataset. o More than three strongly loading items per factor. If these conditions do not apply, a bigger sample size could help alleviate this. But in practice you cannot increase your sample size after you have collected the data. o Schmitt (2011) recommends the Muthen- Muthen simulation method to decide on power that you have in your sample size, but this is implemented in MPlus and not SPSS. COMMUNALITY Communalities are important, but these are only known after finding factor loadings. Thus, there are various different diagnostics you can use to whether your covariance matrix will show communality prior to finding factor loadings: o Low correlations lead to low factor loadings. o Bartlett s test of whether correlations = 0. Want to reject this. o Anti- image correlation matrix. o Kaiser s measure of sampling adequacy. You can run these pre- diagnostics in SPSS under the descriptives tab tick anti- image, and the KMSA and Bartlett s options. > GUTTMAN- KAISER IMAGE APPROACH Image analysis involves partitioning of the variance of an observed variable into common and unique parts: o Produce correlations due to common parts image correlations. o Produce correlations due to unique parts anti- image correlations. Want these to be NEAR- ZERO. o In the table of output in SPSS in the diagonal of anti- image correlations, you want big numbers. Diagonal is the MSA value for the variable (want close to 1), and the off- diagonal is the anti- image correlations (want close to zero). This tells you that the correlations / covariances in the data should be factorisable. o Good to report this: anti- image correlations were close to zero none were greater than the diagonal smallest was 0.8.

KAISER S MEASURE OF SAMPLING ADEQUACY Don t need to know the formula, the output will report a measure of your sampling adequacy. The higher the value, the better.

It is good to report this, but it is better to understand and report the KMO with reference to the anti- image correlation matrix.

15 KAISER S MEASURE OF SAMPLING ADEQUACY Don t need to know the formula, the output will report a measure of your sampling adequacy. The higher the value, the better. The sampling adequacy statistic here is very high this is a good result. Bartlett s test you want this to be significant and it usually is. It is good to report this, but it is better to understand and report the KMO with reference to the anti- image correlation matrix. These diagnostic techniques give you a sense of whether you can factorise your variables. METHODS FOR FINDING FACTORS Schmitt (2011) recommends the use of two SPSS methods for EFA: o Maximum likelihood (ML), and o Principal axis factoring (PA). Try different methods if you obtain consistent results, this is good. The maximum likelihood method has a test of fit. Statistical interpretation of the fit statistics requires the data to be multivariate normal, but factor loadings can always be calculated whether the data is normal or not. o Chi- square values testing reconstructed correlations (from factors) against data correlations. o We want the data to be matched by the model so we DO NOT want the goodness of fit test to be significant. We WANT probabilities GREATER than our significance level. BUT the chi- squared value is very sensitive to sample size, so can pick up trivial mis- specifications with larger samples too easily. If it does not fit, you should look at the RMSEA:

SPSS will not produce the RMSEA. In this case, it came to 0.034 you get all the values from the goodness of fit table except the sample size.

16 SPSS will not produce the RMSEA. In this case, it came to you get all the values from the goodness of fit table except the sample size. o Sometimes the degrees of freedom can be greater than the chi square here RMSEA is treated as 0. o This is an overall error term low values are good. Heywood cases in factor loadings. This is a technical problem you have probably extracted too many factors. This means the computer has inserted a correction because the loading is there is no unique variance. SUGGESTED STEPS IN EFA Check your data by examining the MSA values. Find the most likely number of factors, using one of the better rules such as the parallel test. Where there is some doubt, find solutions for your best estimate and one more and one less numbers of factors. Use a common factor method when you want to interpret the factors. If you just want a weighted sum of variables use principal components. Try oblique rotations first. If there are no correlations between the factors consider using a simpler orthogonal rotation. Unique to EFA will the model work? This is the anti- image analyses. Does the model fit RMSEA. You can also save the factors as new variables, and then run further analyses using these eg do males and females differ on the new factor (whatever this is argued to represent) independent samples t- test. FACTOR SCORES Having identified factors, we can estimate factor scores for each case. o One option form sum of scores for items that load on the factor. This assumes equal weights for each item (tau- equivalent test). If this is not true, alpha is a serious underestimate. o Another option with the assumption of varying factor loadings known as congeneric tests. SPSS provides three ways of doing this regression is the default (this is in the save option). Also Barlett method (recommended). Anderson- Rubin is misleading for oblique solutions as this assumes uncorrelated scores.

17 In practice, the method you use may not matter much. COMPARING PCA AND EFA Principal components analysis: o You do not claim that there is some underlying construct that you are measuring. There are simply themes in the data that you extract. o Works with observed variables. o Components are weighted composites of the observed variables, so are also observed variables. o If a variable is added to or removed from the analysis, the component may change. o If another component is added or removed, the other component loadings do not change. Exploratory factor analysis: o You can claim that you are investigating an underlying factor that causes correlations among your variables. o Factors are latent variables superordinate to the observed variables: they cause the observed variables to correlate. o If an observed variable is added to or removed from the analysis, the others should not change. o If another factor is added or removed, then the factor loadings of the others will change. Why the difference? This is an issue of the diagonal elements of the correlation matrix. o In component analysis, the value of 1.0 is used, and the aim is to explain all the variance of the variable. o In factor analysis, the diagonal element is the communality, and the aim is to explain only the common variance of an element. Comparing EFA and PCA for the data in the example the interpretation is similar. Using PCA over EFA perhaps factor analysis will not work for technical reasons. But PCA will always work as you are not modeling simply collapsing observed variables. If you cannot get a factor analysis, report this, and say that you will run a PCA instead. If you are only looking for themes in the data, then PCA is acceptable. But where you measure correlations with the idea that these reflect some underlying or latent variable, you should run an EFA. INTERPRETING PCA COMPONENTS VS EFA FACTORS Yes, some people say that components should not be interpreted - but I don't see why not and I also question the point of extracting components and using them as variables if you are a priori deciding that the components do not carry any interpretation at all. The subtle difference is that EFA assumes that there is at least one latent factor, a latent construct which the observed variables "measure". PCA does not assume this, so that the interpretation of a component is something like (as I said in the lecture) "a theme in the data", there's no claim that there is a latent construct such as intelligence or self esteem. Consider the cigarette example with which I started the PCA lecture. There are three physical properties of cigarettes but they are correlated so strongly that the data

18 falls pretty much on a straight line. There is no reason why I cannot "interpret" the component (the straight line) as indicating a dimension on which the cigarettes have more or less of the phsyical properties in question. It would be silly not to because that's indeed what it is. But there is no latent construct here. Of course it may well be possible to relate the physical properties dimension to health outcomes for heavy smokers, so this is a sensible and useful piece of analysis.

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and