Applied Multivariate Analysis

Similar documents
VAR2 VAR3 VAR4 VAR5. Or, in terms of basic measurement theory, we could model it as:

Or, in terms of basic measurement theory, we could model it as:

Factor analysis. George Balabanis

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis

Introduction to Factor Analysis

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

Chapter 4: Factor Analysis

Introduction to Factor Analysis

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis

Principal Component Analysis & Factor Analysis. Psych 818 DeShon

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University

Dimensionality Assessment: Additional Methods

STAT 730 Chapter 9: Factor analysis

Dimensionality Reduction Techniques (DRT)

Principles of factor analysis. Roger Watson

Statistical Analysis of Factors that Influence Voter Response Using Factor Analysis and Principal Component Analysis

The European Commission s science and knowledge service. Joint Research Centre

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Exploratory Factor Analysis and Canonical Correlation

Intermediate Social Statistics

Factor Analysis: An Introduction. What is Factor Analysis? 100+ years of Factor Analysis FACTOR ANALYSIS AN INTRODUCTION NILAM RAM

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Principal Components Analysis using R Francis Huang / November 2, 2016

B. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis

Factor Analysis. -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD

Factor Analysis Edpsy/Soc 584 & Psych 594

The European Commission s science and knowledge service. Joint Research Centre

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Package rela. R topics documented: February 20, Version 4.1 Date Title Item Analysis Package with Standard Errors

Didacticiel - Études de cas

Introduction to Confirmatory Factor Analysis

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses.

Applied Multivariate Analysis

Factor Analysis. Summary. Sample StatFolio: factor analysis.sgp

Factor Analysis Using SPSS

Factor Analysis. Qian-Li Xue

Principal Components Analysis and Exploratory Factor Analysis

E X P L O R E R. R e l e a s e A Program for Common Factor Analysis and Related Models for Data Analysis

Exploratory Factor Analysis: dimensionality and factor scores. Psychology 588: Covariance structure and factor models

MSP Research Note. RDQ Reliability, Validity and Norms

Unconstrained Ordination

Factor Analysis (1) Factor Analysis

UCLA STAT 233 Statistical Methods in Biomedical Imaging

Part 2: EFA Outline. Exploratory and Confirmatory Factor Analysis. Basic ideas: 1. Linear regression on common factors. Basic Ideas of Factor Analysis

Retained-Components Factor Transformation: Factor Loadings and Factor Score Predictors in the Column Space of Retained Components

EFA. Exploratory Factor Analysis

Package paramap. R topics documented: September 20, 2017

Quantitative Trendspotting. Rex Yuxing Du and Wagner A. Kamakura. Web Appendix A Inferring and Projecting the Latent Dynamic Factors

A Factor Analysis of Key Decision Factors for Implementing SOA-based Enterprise level Business Intelligence Systems

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Principal Components. Summary. Sample StatFolio: pca.sgp

Phenotypic factor analysis

Manual Of The Program FACTOR. v Windows XP/Vista/W7/W8. Dr. Urbano Lorezo-Seva & Dr. Pere Joan Ferrando

Inter Item Correlation Matrix (R )

L i s t i n g o f C o m m a n d F i l e

Factor Analysis (10/2/13)

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis

Capítulo 12 FACTOR ANALYSIS 12.1 INTRODUCTION

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PRINCIPAL COMPONENTS ANALYSIS (PCA)

Dimension Reduction and Classification Using PCA and Factor. Overview

Composite scales and scores

Principal Components Analysis. Sargur Srihari University at Buffalo

Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE Artificial Intelligence Grad Project Dr. Debasis Mitra

An Introduction to Mplus and Path Analysis

WELCOME! Lecture 14: Factor Analysis, part I Måns Thulin

Multivariate Statistics

6. Let C and D be matrices conformable to multiplication. Then (CD) =

Chapter 3: Testing alternative models of data

Basic IRT Concepts, Models, and Assumptions

Factor Analysis. Statistical Background. Chapter. Herb Stenson and Leland Wilkinson

Measurement Theory. Reliability. Error Sources. = XY r XX. r XY. r YY

Using Structural Equation Modeling to Conduct Confirmatory Factor Analysis

The Common Factor Model. Measurement Methods Lecture 15 Chapter 9

Factor Analysis Continued. Psy 524 Ainsworth

Can Variances of Latent Variables be Scaled in Such a Way That They Correspond to Eigenvalues?

Principal Component Analysis, A Powerful Scoring Technique

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

A non-gaussian decomposition of Total Water Storage (TWS), using Independent Component Analysis (ICA)

Assessment of Some Factors Associated With Empowerment and Development Gap of Women in Three East African Countries

An Introduction to Path Analysis

CHAPTER 11 FACTOR TRANSFORMATIONS: ANALYTIC TRANSFORMATIONS. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

AN INDEX OF FACTORIAL SIMPLICITY* HENRY F. KAISER UNIVERSITY OF CALIFORNIA, BERKELEY AND UNITED STATES COAST GUARD ACADEMY

STATISTICAL LEARNING SYSTEMS

Confirmatory Factor Models (CFA: Confirmatory Factor Analysis)

PRINCIPAL COMPONENT ANALYSIS

Principal Components Analysis (PCA)

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA

Assessment, analysis and interpretation of Patient Reported Outcomes (PROs)

Online Supporting Materials

Admin. Assignment 2: Final Exam. Small Group Presentations. - Due now.

International Journal of Advances in Management, Economics and Entrepreneurship. Available online at: RESEARCH ARTICLE

Principal component analysis (PCA) for clustering gene expression data

Transcription:

Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017

Dimension reduction Exploratory (EFA)

Background While the motivation in PCA is to replace the original (correlated) variables by a small number of uncorrelated new variables that are linear combinations of the old ones such that the new variables capture major part of the variance of the original variables, the aim in EFA is to find underlying latent factors that explain the correlations between the original variables. Thus, as PCA is purely a mathematical transformation to produce new variables, EFA is more model based (observed variables reflect some latent constructs). History of factor analysis dates back to Spearman 1904. 1 1 Spearman, C., 1904, General intelligence objectively determined and measured, American Journal of Psychology 15, 201 293.

Background The idea is that there is an unobserved latent construct that explains the correlation of observed variables (observed variables in factor analysis are often called also as items). Spearman examined correlations of marks between three subjects [Classics (x 1 ), French (x 2 ), and English (x 3 )] obtained from a sample of children. His hypothesis was that there is one latent construct which he named general intelligence (f ) that governs the success in the tests.

Single latent factor Accordingly he ended up with a model x 1 = λ 1 f + u 1 x 2 = λ 2 f + u 2 (1) x 3 = λ 3 f + u 3. In this system λ i is called loading that indicates how strongly the underlying latent factor, f, is reflected by the observed (or manifest) measurement x i, and u i a unique factor that is unique to each individual. Accordingly it is assumed that u i s are not correlated with each other (and of course not with the common factor f ). Thus, f is supposed to explain all the interrelations between the observed variables.

Several latent factors This idea generalizes to several underlying latent variables. The general mathematical representation of a factor structure for p observed variables and k (< p) factors is x 1 = λ 11 f 1 + λ 12 f 2 + + λ 1k f k + u 1 x 2 = λ 21 f 1 + λ 22 f 2 + + λ 2k f k + u 2. (2) x p = λ p1 f 1 + λ p2 f 2 + + λ pk f k + u p, which in matrix format becomes x = Λf + u. (3) x contains the x-variables, Λ contains the loadings, f contains the factors, and u the unique factors that also are called errors in the measurement.

Steps in FA The exploratory nature of the approach implies from the fact that initially it is not clear how many factors there are. Steps in factor analysis are: Find the number of factors Rotate the initial solution to figure out what the factors are Orthogonal rotation Oblique rotation Check the goodness of fit of the factor model, if needed remove variables to facilitate interpretation of the final model Interpret the factor solution

Approaches to extract factor structures There are different approaches to extract a factor structure to find the number of factors. A popular solution is mathematically identical to the principal component solution that amounts to solving eigenvalues of the covariance or correlation matrix. An other popular solution nowadays is the so called maximum likelihood (ML) approach. The ML method relies on normality of the variables and allows for testing statistically the number of factors.

Measure of Sampling Adequacy (MSA) As a preliminary check the Measure of Sampling Adequacy (MSA) or Kaiser-Meyer-Olkin (KMO) is sometimes used as a criterion to judge whether data are appropriate for factor analysis. 0.00 to 0.49 unacceptable 0.50 to 0.59 miserable 0.60 to 0.69 mediocre 0.70 to 0.79 middling 0.80 to 0.89 meritorious 0.90 to 1.00 marvelous

Number of Factors Determining the number of factors Theory: How many factors theory predicts. Eigen values: (i) eigenvalues > 1, (ii) scree-plot, (iii) total variance explained Statistical test: ML method produces chi-square statistics Criterion functions: AIC BIC Interpretation: How many factors can be interpreted

Number of Factors Example 1 In this example 103 police officers were rated by their superiors on 14 scales (Source: SAS/STAT 14.1 User s guide, p 2340). The scree test proposes 3 or 5 factors, the likelihood test indicates that 3 or 4 factors would appropriate. We ll start with the 4-factor solution.

Factor rotation and interpretation Because the factor solution is not unique (with the exception of the one factor case), it can be transformed to facilitate interpretation. The transformation is called rotation which aims to find a representation where each variable loads ideally only on one factor. Thus in this ideal case the observed variables cluster by factors. Such a structure is called a simple structure. If so, we can interpret that the variable reflects that underlying factors. A factor is named according to the variables that load high (i.e., cluster) on it. Rotation can be orthogonal or oblique.

Rotation In the orthogonal rotation the factors are uncorrelated. In the oblique rotation factors are allowed to correlate with each other. This results typically to a simpler structure. Factors, however, should not be too highly correlated (preferably < 0.7 on absolute value) as high correlation implies that the underlying constructs are not well separated. Accordingly, if two factors are highly correlated we say that the discriminant validity of the constructs is low (weak or poor).

Rotation There are several rotation methods. Examples of orthogonal rotations are: Varimax and Quartimax and its variants. Examples of oblique rotations are: Oblimin (and its variants), Quartimin, Promax, and HK (Harris-Kaiser). Oblique rotation produces two loading matrices: Factor pattern which includes the regression coefficients of the observed variables on the latent factors Factor structure which is the correlation matrix of observed variables with the factors.

Selecting items In order to further facilitate interpretation of the factor results some observed variables (items) can be dropped from the analysis (variable selection). Selection is based on checking Communality which is the fraction the factors explain of the variance of the item. Ideally should be >.5 (i.e., over 50%). Primary loading which indicates the strength each item loads on a factor (preferable on absolute value >.5) Cross-loadings indicate the strength an item loads on other factor (preferably small) Meaningfulness, i.e., does the item contribute meaningfully the interpretation of the factor Reliability which refers to internal consistency of the items of each factors (Cronbach s alpha, should be >.6)

Eliminating items Elminating items from an EFA is subjective. Communalities (each ideally >.5) Size of main loading (bare min >.4, preferably >.5, ideally >.6 on absolute values) Meaning of item (face validity), does item contribute meaningfully Contribution it makes to the factor (i.e., is a better measure of the latent factor achieved by including or not including this item?) Number of items already in the factor (i.e., if there are already many items (e.g.,> 6) in the factor, then the researcher can be more selective about which ones to include and which ones to drop) Eliminate 1 variable at a time, then re-run, before deciding which/if any items to eliminate next Cross loadings should be low, preferably on absolute value <.3 There should be minimum 2 items per factor, preferably 3 or more (however, not too many as hamper interpretation)

Example Rotating the four factors: (see SAS example on the web-site) In the police job rating example it turns out that all the four factors are hard to interpret. A three factor solution is easier. Factor 1: Physical skills Factor 2: Interpersonal skills Factor 3: Cognitive skills