Intermediate Social Statistics

Similar documents
Chapter 4: Factor Analysis

Applied Multivariate Analysis

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

Introduction to Confirmatory Factor Analysis

Introduction to Factor Analysis

Dimensionality Reduction Techniques (DRT)

Factor analysis. George Balabanis

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis

Introduction to Factor Analysis

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

Principal Component Analysis & Factor Analysis. Psych 818 DeShon

The Common Factor Model. Measurement Methods Lecture 15 Chapter 9

Factor Analysis Continued. Psy 524 Ainsworth

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models

STAT 730 Chapter 9: Factor analysis

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

Chapter 3: Testing alternative models of data

Introduction to Structural Equation Modeling

Factor Analysis & Structural Equation Models. CS185 Human Computer Interaction

Factor Analysis. Qian-Li Xue

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Confirmatory Factor Analysis. Psych 818 DeShon

The 3 Indeterminacies of Common Factor Analysis

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Exploratory Factor Analysis and Canonical Correlation

Chapter 8. Models with Structural and Measurement Components. Overview. Characteristics of SR models. Analysis of SR models. Estimation of SR models

B. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis

Robustness of factor analysis in analysis of data with discrete variables

Factor Analysis. -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD

Dimension Reduction and Classification Using PCA and Factor. Overview

Factor Analysis Edpsy/Soc 584 & Psych 594

Key Algebraic Results in Linear Regression

Principal Component Analysis (PCA) Theory, Practice, and Examples

Psychology 454: Latent Variable Modeling How do you know if a model works?

Multivariate Statistics

Inference using structural equations with latent variables

Confirmatory Factor Analysis

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Hypothesis Testing for Var-Cov Components

Factor Analysis (10/2/13)

Or, in terms of basic measurement theory, we could model it as:

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

VAR2 VAR3 VAR4 VAR5. Or, in terms of basic measurement theory, we could model it as:

An Introduction to Mplus and Path Analysis

Unconstrained Ordination

Structural Model Equivalence

9.1 Orthogonal factor model.

An Introduction to Path Analysis

Factor Analysis. Statistical Background. Chapter. Herb Stenson and Leland Wilkinson

Part 2: EFA Outline. Exploratory and Confirmatory Factor Analysis. Basic ideas: 1. Linear regression on common factors. Basic Ideas of Factor Analysis

Principal Components Theory Notes

Inter Item Correlation Matrix (R )

UNIVERSITY OF CALGARY. The Influence of Model Components and Misspecification Type on the Performance of the

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood

Principal Components Analysis and Exploratory Factor Analysis

Factor Analysis: An Introduction. What is Factor Analysis? 100+ years of Factor Analysis FACTOR ANALYSIS AN INTRODUCTION NILAM RAM

Principles of factor analysis. Roger Watson

Package paramap. R topics documented: September 20, 2017

Machine Learning 2nd Edition

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE Artificial Intelligence Grad Project Dr. Debasis Mitra

Pollution Sources Detection via Principal Component Analysis and Rotation

EFA. Exploratory Factor Analysis

R = µ + Bf Arbitrage Pricing Model, APM

WELCOME! Lecture 14: Factor Analysis, part I Måns Thulin

Factor Analysis. Summary. Sample StatFolio: factor analysis.sgp

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University

Penalized varimax. Abstract

Statistics Introductory Correlation

Quantitative Understanding in Biology Principal Components Analysis

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

STA 431s17 Assignment Eight 1

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

w. T. Federer, z. D. Feng and c. E. McCulloch

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R,

Ross (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM.

Feature Transformation

STT 843 Key to Homework 1 Spring 2018

1 Principal Components Analysis

Statistical Analysis of Factors that Influence Voter Response Using Factor Analysis and Principal Component Analysis

Introduction to Structural Equation Modeling Dominique Zephyr Applied Statistics Lab

More PCA; and, Factor Analysis

sempower Manual Morten Moshagen

Phenotypic factor analysis

Canonical Correlation & Principle Components Analysis

Condition 9 and 10 Tests of Model Confirmation with SEM Techniques

What is Structural Equation Modelling?

Estimation of Curvilinear Effects in SEM. Rex B. Kline, September 2009

Wavelet Transform And Principal Component Analysis Based Feature Extraction

Transcription:

Intermediate Social Statistics Lecture 5. Factor Analysis Tom A.B. Snijders University of Oxford January, 2008 c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 1 / 31 This course is taught by Raymond Duch and Tom Snijders. Computer classes by David Armstrong and Mark Pickup. Course websites: http://www.raymondduch.com/ (see teaching) http://www.stats.ox.ac.uk/ snijders/iss.htm Today: Factor Analysis. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 2 / 31

Factor analysis Factor analysis is used for two broad purposes. 1. Measurement (confirmatory factor analysis). The classical example is the measurement of intelligence by Spearman (1904). 2. Compression of information (exploratory factor analysis). Reduction of several variables to a much smaller number that contain the same information. Here we primarily treat confirmatory factor analysis. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 3 / 31 Confirmatory factor analysis The researcher postulates a latent variable which cannot be observed directly, but of which indicators can be observed. Examples : 1. Intelligence measurement Indicators: scores on tasks depending on intelligence. 2. Left-right political attitudes Indicators: agreement with policy-related statements. The simplest FA model has one factor F and a number (say p) of observed indicators X i ; they are related by the equation X i = a i0 + a i1 F + U i (i = 1,..., p) where U i are the residuals. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 4 / 31

Latent Observed Error a 1 x 1 U 1 a 2 x 2 U 2 F a 3... x 3 U 3...... ap x p U p One-factor model c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 5 / 31 The factor F and the residuals u i are unobserved. Is it possible to estimate and test this model? Distributional assumptions 1. The factor F and residuals U i all are uncorrelated, and have expected values 0. 2. The factor F has unit variance. 3. Possibly: the factor F and residuals U i all have normal distributions. The assumption of linear relations between the factors and observed variables is essential for FA. The assumption of normal distributions is necessary for statistical testing. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 6 / 31

A consequence of the linearity and non-correlation assumptions, for the one-factor model, is Cov(X i, X j ) = Cov(a i F + U i, a j F + U j ) = a i a j (i j) and hence a i a j Corr(X i, X j ) = S.D.(X i ) S.D.(X j ). This means that all the rows of the correlation matrix of X are proportional, and similarly all the columns are proportional. Also, perhaps after multiplying some of the variables by 1 to obtain the correct polarity, all correlations are positive. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 7 / 31 The general linear factor model It is a very strong requirement to explain all in a set of variables (except for very small sets) by one common factor. A more applicable model is the linear factor model with a general number of q factors: X i = a i0 + a i1 F 1 + a i2 F 2 +... + a iq F q + U i (i = 1,..., p). It is usual to standardize observed variables to have zero means (a i0 = 0 )and unit variances. In matrix notation, X = AF + U, where Cov(F ) = I and Cov(U) is diagonal. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 8 / 31

The parameters a ij are called the factor loadings. Put in a matrix they are called the pattern matrix: a 11 a 12... a 1q a 21 a 22... a 2q................. A =................................... a p1 a p2... a pq The default models have uncorrelated factors : Cov(F) = I; but interpretation may become more attractive when allowing correlated factors. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 9 / 31 The standard distributional assumptions now are: 1. The factors F j are uncorrelated with the residuals U i, and the residuals are mutually uncorrelated. 2. The factors F j and residuals U i all have expected values 0. 3. The factors F j have unit variances. 4. Possibly 1: The factors F j are mutually uncorrelated. 5. Possibly 2: the factors F j and residuals U i have multivariate normal distributions. However, the assumptions about zero correlations may partially be dropped (see later example). c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 10 / 31

The variance of X i is given by q aij 2 + σi 2, where σi 2 = Var(U i ). j=1 This is 1 under the unit variance assumption. The first part of this, q hi 2 = aij 2, j=1 is explained by the factors and is called the communality of X i ; the remainder, σi 2 = Var(U i ) is the unique variance of X i. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 11 / 31 More generally: Cov(Y ) = Cov(AF + U) = AA + Σ. Thus, we have (p q) + q parameters in A and Σ to model the p(p + 1)/2 parameters in Cov(Y ). However, for 2 or more factors, the pattern matrix can be rotated and yield exactly the same fit: AR has the same fit as A if RR = I. The resulting number of restrictions on the covariance matrix implied by the standard factor model is ( (p q) 2 (p + q) ) /2. These restrictions can be tested by a likelihood ratio chi-squared test. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 12 / 31

Explained variance For orthogonal (i.e., uncorrelated) factors, the total variance explained by factor F j is p i=1 a2 ij. This is usually referred to as the eigenvalue for factor F j. For all relevant estimation methods, this is largest for the first factor F 1, and decreases with j. The number of factors will be determined so, that a large total explained variance q p aij 2 j=1 i=1 is obtained for a small number q of factors. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 13 / 31 Rotation of factors The rotation of factors is a linear transformation such that the factors remain uncorrelated and retain unit variances. This changes the pattern matrix, but leaves the model as a whole unchanged: it is a reparametrization of the same model. The best interpretation usually is obtained when some of the loadings are high, and others are close to 0: simple structure. Procedures for rotation in view of obtaining simple structure are varimax (max. variance of squared loadings in columns) and quartimax (max. variance of squared loadings in rows). Oblique rotation methods try to do this c without Tom A.B. Snijders the(university restriction of Oxford) of uncorrelated Intermediate Socialfactors. Statistics January, 2008 14 / 31

Example: measurement of social capital From Petr Matějů and Anna Vitásková: Interpersonal Trust and Mutually Beneficial Exchanges. Czech Sociological Review, 2006, Vol. 42, No. 3: 493 516. In a study on social capital, Matějů and Vitásková (2006) remarked that social capital tends to be conceptualized in two different ways: as a characteristic of the social environment based on interpersonal and institutional trust; and as a characteristic of individuals embedded in mutually beneficial social exchanges. They proposed to measure these two kinds of social capital by means of the following questions. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 15 / 31 1. TRUST1 = TRUST: There are only a few people I can trust completely. 2. TRUST2 = BEST: Most of the time you can be sure that other people want the best for you. 3. TRUST3 = ADVNT: If you are not careful, other people will take advantage of you. 4. EXNET1 = PRVHLP: How often, because of your job, the office you hold, or contacts you have, do other people (relatives, friends, acquaintances) turn to you to help them solve some problems, cope with difficult situations, or apply your influence for their benefit? 5. EXNET2 = GETHLP: And what about you? When you are in a difficult situation, do you think there are people who could intervene on your behalf? 6. EXNET3 = IMPORT: How important a role do useful contacts play in your life? c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 16 / 31

These questions were used in a survey of 1200 adult inhabitants of the Czech republic, held in 2001 as an extension of the International Social Survey Programme (ISSP). The answer categories were 5-point scales. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 17 / 31 As you see, the correlations do not follow the pattern for a one-factor model. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 18 / 31

Nevertheless a one-factor model was fitted as a first step. (Factor loadings in figure are incorrect.) c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 19 / 31 Note the number of parameters: 1 factor, 6 variables: 6 factor loadings, 6 residual variances; 6 variables: 15 parameters for a totally free covariance matrix. Hence 9 residual degrees of freedom. The likelihood ratio test with χ 2 = 214.3, d.f. = 9, indicates an extremely poor fit. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 20 / 31

The underlying theory pointed to a two-factor model with loadings of the first three variables on Factor 1, and of the last three on Factor 2. This model had χ 2 = 49.7, d.f. = 9, a substantial improvement but still not a good fit. Therefore some of the zero-correlation assumptions were dropped. This led to the following model, which had χ 2 = 3.6, d.f. = 6. This model fits too well, it is not parsimonious. Dropping the between-factor correlation leads to χ 2 = 6.0, d.f. = 7. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 21 / 31 c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 22 / 31

Assessment of fit There are various ways in which the fit of a FA model can be assessed. 1. The likelihood ratio (LR) test of the implied restrictions on the covariance matrix of Y. 2. Compare the fitted with the observed covariance (correlation) matrix. 3. The LR test has high power for large sample sizes; this may lead to overly complicated (non-parsimonious) models. Various fit indices have been developed which take into account fit between model and data, degrees of freedom (df ), and sample size (N). c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 23 / 31 Root Mean Square Error of Approximation The Root Mean Square Error of Approximation (RMSEA) (Steiger and Lind, 1980) is a descriptive measure indicating relative lack of fit: (χ RMSEA = 2 /df ) 1 N 1 where N = sample size. Rule of thumb: RMSE.05 signals good fit, >.10 poor fit. Note that all model selection procedures not based on a test for the null hypothesis that the model holds (such as the LR test) imply that the researcher is satisfied with a model that is not true, but a good approximation. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 24 / 31

Many other fit indices have been developed; see the literature, e.g. overviews in Karin Schermelleh-Engel, Helfried Moosbrugger, and Hans Müller, Evaluating the Fit of Structural Equation Models: Tests of Significance and Descriptive Goodness-of-Fit Measures. Methods of Psychological Research Online 2003, 8.2, 23 74 Xitao Fan and Stephen A. Sivo, Sensitivity of Fit Indexes to Misspecified Structural or Measurement Model Components: Rationale of Two-Index Strategy Revisited. Structural Equation Modeling, 12, 343 367. (see course website). c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 25 / 31 Determination of the number of factors The major way of obtaining a well-fitting but parsimonious model is by determining the number of factors. This is done by considering the various fit indices. Often a scree plot is made of the explained variance obtained by each consecutive (orthogonal) factor. Fine-tuning is done by setting some factor loadings to 0, and by allowing correlations between factors or between residuals. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 26 / 31

Factor scores Often it is important to estimate the values of the latent variables. Since statisticians use the term estimation for trying to approximate parameters and prediction for trying to approximate values of random variables, this is technically called the prediction of the latent variables. The predictors are called factor scores. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 27 / 31 The factor scores can be predicted by the conditional means of the latent variables, given the observed variables. Interpretation of factors can be done using factor loadings and using factor scores; for orthogonal factors this will lead to the same picture, for correlated factors there are differences in interpretation. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 28 / 31

Principal Component Analysis (PCA) A method related to Factor Analysis, but without the assumptions of the existence of latent variables, is Principal Component Analysis. Here the observed variables are again X 1, X 2,..., X p; now the purpose of the method is to obtain q variables that are linear combinations of X 1,..., X p and from which the original X i can be optimally predicted. This is similar to Factor Analysis without residuals: the vector of components is defined as A X ; the observed covariance matrix of Y is approximated by a covariance matrix of the form A A, and the number q of components is determined such that the total explained variance is high, for a low q. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 29 / 31 Factor Analysis Glossary (1) Thanks to Dave Armstrong! Factor Loading Coefficient aij relating the unobserved variable Fj to the observed variable Xi. Factor Pattern Matrix Matrix of factor loadings, a11 a12 a1q a21 a22 a2q.... ap1 ap2 apq Communality Amount of variance of observed variable Xi shared with the other variables; usually denoted as hi 2 = q j=1 aij. Uniqueness Amount of an observed variable s variance not shared with the other variables. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 30 / 31

Factor Analysis Glossary (2) Eigenvalue In an unrotated factor solution, the amount of variance explained by each factor. Rotation Factor solutions are only identified up to a rotation, meaning there are infinitely many solutions that are equally good in terms of variance explained (ability to reproduce the correlation matrix). Rotating means moving the factors around in space often so they explain the same amount of variance, but so they also have other desirable properties. Factor Structure Matrix Asymmetric matrix of correlations between the observed variables and the factors. This is the same as the Factor Pattern Matrix for orthogonal factors, but these are not the same when we allow the factors to be correlated. c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 31 / 31