STAT 730 Chapter 9: Factor analysis

Similar documents
Chapter 4: Factor Analysis

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Introduction to Factor Analysis

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Factor analysis. George Balabanis

Applied Multivariate Analysis

Introduction to Factor Analysis

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis

An Introduction to Path Analysis

Factor Analysis (10/2/13)

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis

Factor Analysis. Qian-Li Xue

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

9.1 Orthogonal factor model.

Introduction to Confirmatory Factor Analysis

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

Introduction to Structural Equation Modeling

VAR2 VAR3 VAR4 VAR5. Or, in terms of basic measurement theory, we could model it as:

An Introduction to Mplus and Path Analysis

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013

Factor Analysis: An Introduction. What is Factor Analysis? 100+ years of Factor Analysis FACTOR ANALYSIS AN INTRODUCTION NILAM RAM

Factor Analysis. -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD

B. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis

Dimensionality Reduction Techniques (DRT)

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

Exploratory Factor Analysis and Principal Component Analysis

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

Factor Analysis Edpsy/Soc 584 & Psych 594

Using Structural Equation Modeling to Conduct Confirmatory Factor Analysis

Or, in terms of basic measurement theory, we could model it as:

Intermediate Social Statistics

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Introduction to Structural Equation Modeling Dominique Zephyr Applied Statistics Lab

Principal Component Analysis & Factor Analysis. Psych 818 DeShon

Exploratory Factor Analysis and Canonical Correlation

Exploratory Factor Analysis and Principal Component Analysis

SAS/STAT 14.1 User s Guide. Introduction to Structural Equation Modeling with Latent Variables

CS281 Section 4: Factor Analysis and PCA

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

STAT 730 Chapter 5: Hypothesis Testing

Latent variable interactions

Stat 206, Week 6: Factor Analysis

What is Structural Equation Modelling?

STATISTICAL LEARNING SYSTEMS

sempower Manual Morten Moshagen

Preface. List of examples

Multivariate Statistics

Part 2: EFA Outline. Exploratory and Confirmatory Factor Analysis. Basic ideas: 1. Linear regression on common factors. Basic Ideas of Factor Analysis

Estimation of Curvilinear Effects in SEM. Rex B. Kline, September 2009

1. The Multivariate Classical Linear Regression Model

Factor Analysis and Kalman Filtering (11/2/04)

The Common Factor Model. Measurement Methods Lecture 15 Chapter 9

Structural Equation Modelling

Lecture 6: April 19, 2002

[y i α βx i ] 2 (2) Q = i=1

Chapter 8. Models with Structural and Measurement Components. Overview. Characteristics of SR models. Analysis of SR models. Estimation of SR models

Introduction to Machine Learning

Factor Analysis Continued. Psy 524 Ainsworth

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood

Independent Component Analysis

Inference using structural equations with latent variables

Correlation and Regression

Package paramap. R topics documented: September 20, 2017

AN INTRODUCTION TO STRUCTURAL EQUATION MODELING WITH AN APPLICATION TO THE BLOGOSPHERE

SEM Day 1 Lab Exercises SPIDA 2007 Dave Flora

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Key Algebraic Results in Linear Regression

Chapter 3: Testing alternative models of data

SEM 2: Structural Equation Modeling

Vector Auto-Regressive Models

Linear Dimensionality Reduction

VAR Models and Applications

RESMA course Introduction to LISREL. Harry Ganzeboom RESMA Data Analysis & Report #4 February

A note on structured means analysis for a single group. André Beauducel 1. October 3 rd, 2015

CFA Loading Estimation and Comparison Example Joel S Steele, PhD

Principal Components Analysis and Exploratory Factor Analysis

1 Data Arrays and Decompositions

SEM 2: Structural Equation Modeling

Psychology 454: Latent Variable Modeling How do you know if a model works?

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Can Variances of Latent Variables be Scaled in Such a Way That They Correspond to Eigenvalues?

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

Introduction to Structural Equation Modeling: Issues and Practical Considerations

STAT 730 Chapter 4: Estimation

SRMR in Mplus. Tihomir Asparouhov and Bengt Muthén. May 2, 2018

Capítulo 12 FACTOR ANALYSIS 12.1 INTRODUCTION

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

Eigenvalues, Eigenvectors, and an Intro to PCA

ADVANCED C. MEASUREMENT INVARIANCE SEM REX B KLINE CONCORDIA

PRESENTATION TITLE. Is my survey biased? The importance of measurement invariance. Yusuke Kuroki Sunny Moon November 9 th, 2017

Scatter plot of data from the study. Linear Regression

Eigenvalues, Eigenvectors, and an Intro to PCA

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Part 6: Multivariate Normal and Linear Models

Confirmatory Factor Analysis

Pollution Sources Detection via Principal Component Analysis and Rotation

Introduction to Matrix Algebra and the Multivariate Normal Distribution

EVALUATION OF STRUCTURAL EQUATION MODELS

2.1 Linear regression with matrices

Transcription:

STAT 730 Chapter 9: Factor analysis Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Data Analysis 1 / 15

Basic idea Factor analysis attempts to explain the correlation among a large number of variables using a small number of latent factors. Example: Spearman (1904) considered children s exam performance in classics x 1, French x 2, and English x 3. The 1 0.83 0.78 estimated correlation matrix is R = 0.83 1 0.67. One 0.78 0.67 1 can postulate a model with a common factor x 1 = µ 1 + λ 1 f + u 1 x 2 = µ 2 + λ 2 f + u 2 x 3 = µ 3 + λ 3 f + u 3. In matrix terms x = µ + Λf + u. Here, f is the common underlying factor (overall ability) that explains all three scores. 2 / 15

The factor model Let x (µ, Σ). The factor model is x = µ + Λf + u, where x R p is the response, Λ R p k are the factor loadings, f R k are the common factors (hopefully k is small, e.g. k = 1 or k = 2 if we re lucky), and u is error, also called specific or unique factors. It is assumed f (0, I k ), u (0, Ψ), Ψ = diag(ψ 11,..., ψ pp ), C(f, u) = 0. Note then that V (x) = Σ = ΛΛ + Ψ. This must hold if the k-factor model is true. 3 / 15

Motivation from PCA x = µ + ΓΓ (x µ) = µ + [Γ k Γ p k ] [ Γ k Γ p k ] (x µ) = [ ] Γ µ + [Γ k Γ p k ] k (x µ) Γ p k (x µ) = µ + Γ }{{} k Γ k (x µ) + Γ }{{} p k Γ p k (x µ). }{{} Λ f u This in fact approximately holds when V (u) is small, which happens when the first k principal components explain most of the variability. See MKB Section 9.8 (pp.275 276). Note that under the general factor model, u is random scatter in all directions, not necessarily orthogonal to Λf, so this is motivation but the models are different. However, this motivation for factor analysis leads to choosing k based on a PCA scree plot. 4 / 15

Communalities Recall V (x) = Σ = ΛΛ + Ψ. Thus h 2 i V (x i ) = σ ii = k j=1 λ 2 ij }{{} h 2 i +ψ ii. is the ith communality, the amount of variance of x i shared with other variables through the common factors; what is left over is the specific (or unique) variance ψ ii. 5 / 15

Scale invariance Let y = Cx where C is a diagonal matrix and x = µ + Λf + u is a k-factor model as on the last slide. Then y = Cµ + CΛf + Cu, where V (y) = [CΛ][CΛ] + CΨC. The factor model thus holds for y. Unlike PCA, FA is unaffected by rescaling the outcomes. 6 / 15

Non-uniqueness of Λ and f Let G R k k s.t. GG = G G = I k and x = µ + Λf + u is a k-factor model. Then x = µ + [ΛG][G f] + u. To achieve identifiability, constraints such as Λ Ψ 1 Λ = D where D is diagonal are used. After the factors are estimated ˆΛ, they are further rotated by G to = ˆΛG in order to maximize φ = k p i=1 j=1 (d 2 ij d i ) 2, d ij = δ ij k, d j = 1 s=1 δ2 p is p i=1 d 2 ij. This is called varimax rotation, developed by Kaiser (1958), and produces loadings with a few large loadings and many near-zero loadings. Such rotations are more interpretable, as discussed in class for PCA. An alternative rotation is the promax, which produces correlated factors; it is varimax followed by an oblique (not orthogonal) Procrustes rotation. 7 / 15

Exploratory vs. confirmatory FA Near-zero loadings lead to a natural question: are they actually zero? A researcher might have in mind one or more factors (math ability, reading ability) and hypothesize that these factors are related to only a subset of observed variables (division, comprehension). Furthermore, the factors may or may not be correlated. Exploratory FA seeks to find out how many factors are needed and get a rough idea of which factors highly load (are related) to variables. There are no particular hypotheses in mind at this stage. Confirmatory FA seeks to validate (or form) a theoretical hypothesis. Specific (causal) regressions are stipulated and an overall model is fit, essentially setting some loadings to zero, forcing correlations to be zero, etc. The number of factors are given in advance, as well as their relationship to the observed variables. The hypothetical model is then checked for overall fit, thereby confirming or denying a given hypothesis. 8 / 15

Fitting When Σ is unconstrained there are 1 2p(p + 1) free parameters; Λ has kp parameters and Ψ has p. The constraint Λ Ψ 1 Λ is diagonal eats up 1 2k(k + 1) parameters, so the total number of free parameters ends up being s = 1 2 (p k)2 1 2 (p + k). One wants s > 0, which will happen when k << p; s > 0 implies that the factor model offers a simpler interpretation than allowing Σ to be completely arbitrary. When s > 0 the factor model can be estimated, yielding ˆΛ and ˆΨ from data x 1,..., x n R p. There are two main methods: Principal factor analysis (pp. 261 263). Maximum likelihood (pp. 263 267) assuming x 1,..., x n iid Np (µ, ΛΛ + Ψ). The MLE approach also allows for a the test H 0 : k is adequate vs. H a : Σ is unconstrained. This test is described on pp. 265 268 and automatically provided by factanal in R. 9 / 15

MLE factor analysis # Spearman s data R=matrix(c(1,0.83,0.78,0.83,1,0.67,0.78,0.67,1),3,3) f=factanal(covmat=r,factors=1,rotation="varimax") # varimax, promax, or none print(f,digits=2,cutoff=.3,sort=true) # compare to bottom p. 260 # open/closed book exams library(bootstrap) data(scor) # note that rotation is unnecessary when k=1 plot(prcomp(scor),type="l") # k=1? f=factanal(scor,factors=1,rotation="varimax") print(f,digits=2,cutoff=.3,sort=true) # compare to bottom p. 266 The option scores="regression" or scores="bartlett" produce factor scores (see MKB p. 274). Use the covmat= option to enter a correlation or covariance matrix directly. If entering a covariance matrix, include the option n.obs= for tests. 10 / 15

Structural equation modeling Structural equation modeling (SEM) generalizes (confirmatory) factor analysis. The sem package was the first package to allow the routine fitting of SEM in R; recently the lavaan (latent variable analysis) package was introduced with much more functionality, rivaling the commercial packages Mplus and LISREL. Path diagrams can be created using semplot using lavaan modeling syntax. 11 / 15

SEM Structural equations are simply a series of related regression models involving observed and latent variables. The distinction between predictor and response gets blurred as variables can appear on either side of the regression equations, although they may appear on the left as a dependent variable only once. General Gaussian covariance structures are allowed for the errors. Maximum likelihood (based on normality) factor analysis is a special case. SEM are constructed using a hypothesized causal pathway but it is important to note that one cannot infer causation from observational data. SEMs provide a structured means to explore possible simple causal paths, but do not confirm their existence. 12 / 15

LISREL model: measurement submodel Jöreskog and Sörbom came up with the LISREL (linear structural relations) model, as well as the software which is currently over $700; the cost of Mplus is similar. Note that lavaan is free. Have two sets of latent factors now (endogenous and exogenous), associated with two sets of observed variables through linear regressions. We observe (y i, x i ) on subject i. The measurement equations are y i = Λ y f y i + u y i, x i = Λ x f x i + u x i. The latent cause-and-effect factors f y i and f x i are endogenous and exogenous latent variables. In general, endogenous (independent) variables are explained by other variables through linear regression; these are on the left-hand side and include y i and x i. Exogenous variables are on the right-hand side and are (conditionally) fixed explanatory variables. In general, variables can be either, but should appear on the left-side only once. 13 / 15

LISREL model: structural submodel The structural submodel is f y i = B y f y i + B x f x i + u m i. Here, B y has zeros along the diagonal, u x i, uy i, um i are mutually uncorrelated and mean-zero, and C( x i, um i ) = C( y i, uy i ) = C(ux i, fx i ) = 0. Also, um iid i (0, Ψ) indep. fi x iid (0, Φ). The off-diagonals of Ψ and Φ can be non-zero reflecting correlations among the latent endogenous and exogenous factors. Confirmatory FA places only uses x i and places structure on Λ x as well as Φ. lavaan can fit much more general models. For example, note that E(x i ) = E(y i ) = 0 above. We may also want to regress some of the observed variables onto other observed variables, as well as latent variables. 14 / 15

Measures of model fit χ 2 simply looks at the observed and estimated covariance matrices, yielding a p-value. However, the test may inappropriately reject in large samples and inappropriately not reject in small samples, so other measures of fit are often used. The root mean square error of approximation (RMSEA) also looks at the difference between the fitted model and observed covariance. RMSEA ranges from 0 to 1; smaller is better. lavaan provides a p-value for H 0 : RMSEA 0.05 (want to accept). The standardized root mean square residual (SRMR) is yet another discrepancy between the observed and fitted model covariance matrices, and ranges from 0 to 1. Researchers look for SRMR 0.08. All of these measures are available in lavaan. 15 / 15