Multivariate Fundamentals: Rotation. Exploratory Factor Analysis

Similar documents
Applied Multivariate Analysis

Factor analysis. George Balabanis

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis

STAT 730 Chapter 9: Factor analysis

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

Statistical Analysis of Factors that Influence Voter Response Using Factor Analysis and Principal Component Analysis

VAR2 VAR3 VAR4 VAR5. Or, in terms of basic measurement theory, we could model it as:

Intermediate Social Statistics

Or, in terms of basic measurement theory, we could model it as:

Factor Analysis. -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD

Principal Component Analysis & Factor Analysis. Psych 818 DeShon

Exploratory Factor Analysis and Canonical Correlation

Introduction to Factor Analysis

B. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Introduction to Factor Analysis

Principal Component Analysis

Unconstrained Ordination

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

Chapter 4: Factor Analysis

Principal Component Analysis (PCA) Theory, Practice, and Examples

Principal Component Analysis (PCA) Principal Component Analysis (PCA)

EFA. Exploratory Factor Analysis

POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Factor Analysis: An Introduction. What is Factor Analysis? 100+ years of Factor Analysis FACTOR ANALYSIS AN INTRODUCTION NILAM RAM

Inferences for Regression

Factor Analysis (10/2/13)

An Introduction to Path Analysis

e 2 e 1 (a) (b) (d) (c)

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Package paramap. R topics documented: September 20, 2017

PCA Advanced Examples & Applications

Factor Analysis. Qian-Li Xue

An Introduction to Mplus and Path Analysis

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Dimensionality Reduction Techniques (DRT)

Multivariate Statistics

Feature Transformation

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis

Dimension Reduction and Classification Using PCA and Factor. Overview

Introduction to Structural Equation Modeling

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Principal Components. Summary. Sample StatFolio: pca.sgp

A Peak to the World of Multivariate Statistical Analysis

Spearman Rho Correlation

Retained-Components Factor Transformation: Factor Loadings and Factor Score Predictors in the Column Space of Retained Components

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Ordination & PCA. Ordination. Ordination

EC4051 Project and Introductory Econometrics

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4

Canonical Correlation & Principle Components Analysis

Principal Components Analysis (PCA)

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Eigenvalues, Eigenvectors, and an Intro to PCA

Master of Science in Statistics A Proposal

Eigenvalues, Eigenvectors, and an Intro to PCA

Advanced Introduction to Machine Learning

Computer exercise 3: PCA, CCA and factors. Principal component analysis. Eigenvalues and eigenvectors

Exploratory Factor Analysis: dimensionality and factor scores. Psychology 588: Covariance structure and factor models

Machine Learning 2nd Edition

Noise & Data Reduction

a Short Introduction

Quantitative Understanding in Biology Principal Components Analysis

R in Linguistic Analysis. Wassink 2012 University of Washington Week 6

Francina Dominguez*, Praveen Kumar Department of Civil and Environmental Engineering University of Illinois at Urbana-Champaign

Principles of factor analysis. Roger Watson

Inference with Simple Regression

CHAPTER 6: SPECIFICATION VARIABLES

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

PRINCIPAL COMPONENTS ANALYSIS

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

Statistical inference

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Quantitative Trendspotting. Rex Yuxing Du and Wagner A. Kamakura. Web Appendix A Inferring and Projecting the Latent Dynamic Factors

Lectures 5 & 6: Hypothesis Testing

Factor Analysis. Summary. Sample StatFolio: factor analysis.sgp

ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE. School of Mathematical Sciences

PRINCIPAL COMPONENT ANALYSIS

Chapter 11 - Lecture 1 Single Factor ANOVA

Using Factor Analysis to Study the Effecting Factor on Traffic Accidents

Unit 10: Simple Linear Regression and Correlation

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

Canonical Correlations

Principal component analysis

Principal Components Analysis using R Francis Huang / November 2, 2016

Multivariate analysis of genetic data: exploring groups diversity

PRINCIPAL COMPONENTS ANALYSIS (PCA)

Unsupervised machine learning

Principal Component Analysis

Research Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D.

Mgmt 469. Causality and Identification

Lecture 7: Con3nuous Latent Variable Models

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses.

Model Estimation Example

Transcription:

Multivariate Fundamentals: Rotation Exploratory Factor Analysis

PCA Analysis A Review Precipitation Temperature Ecosystems

PCA Analysis with Spatial Data Proportion of variance explained Comp.1 + Comp.2 + Comp.3 ~= 95% Loadings

PCA Analysis with Spatial Data Loadings Loadings indicate MANY climate variables are associated with Component 1 Intuitively these variables are associated with growth conditions

PCA Analysis with Spatial Data Loadings Loadings indicate temperature variables are associated with Component 2 Intuitively these variables are associated with continetality

PCA Analysis with Spatial Data Loadings Loadings indicate moisture variables are associated with Component 3 Intuitively these variables are associated with weather jet stream

PCA Analysis with Spatial Data

Exploratory Factor Analysis (EFA) Objective - Rotate that data so that new axis explains the greatest amount of variation within the data (same as PCA) But, unlike PCA, the key concept of factor analysis is that multiple observed variables have similar patterns of responses because they are all associated with a latent (i.e. not directly measured) variable. In the context of EFA, PCA can be considered a technique to simply reduce variables Think of EFA as: There is a bigger picture controlling the variables I am analyzing, and I want to better understand the relationship with those underlying unobservable factors Charles Spearman (1863-1945)

PC 2 The math behind EFA Renewable Resources MSc Example: Assume Y i is linearly related to F 1 and F 2 as follows: Y 1 = β 10 + β 11 F 1 + β 12 F 2 + ε 1 Y 2 = β 20 + β 21 F 1 + β 22 F 2 + ε 2 Y 3 = β 30 + β 31 F 1 + β 32 F 2 + ε 3 Y 4 = β 40 + β 41 F 1 + β 42 F 2 + ε 4 Y 5 = β 50 + β 51 F 1 + β 52 F 2 + ε 5 Student ID Ecology, Y 1 Policy, Y 2 Geography, Y 3 Design, Y 4 Statistics, Y 5 1 3 2 3 6 5 2 7 7 7 4 4 3 4 4 4 4 4 4 5 5 5 5 5 5 7 6 6 4 5 F i = unobservable factor (e.g. writing ability, mathematical ability) ε 1 = error term β i = loadings Comes from PCA Ecology Policy EFA utilizes the outputs of PCA, but continues to rotate the data to investigate the relationship with underlying unobservable factors Because EFA utilizes PCA components EFA does NOT completely maximize variance explained Geography Statistics PC 1 Design

Determining the number of factors FA should ONLY include important factors There are a number of different ways to determine the number of factors you should use But the easiest way is to set the number of factors to the number of PC components that explain a significant portion of the variation in the data

Exploratory Factor Analysis in R Exploratory FA in R: factanal(datamatrix,factors=n,rotation="type") (stats package) Data matrix of original predictor variables NOT principal components The number of factors you want to include Note: R will ONLY let you calculate the number of factors where variance explained is sufficient The statistical method used to rotation the data Rotation options fall in to 2 categories Orthogonal rotation assumes your factors are uncorrelated Oblique rotation assumes your factors are correlated function options: "varimax", "quatimax" function options: "promax", "oblimin" Additional function options: "none", "simplimax", "cluster When selecting your rotation method consider correlation between factors Simple criteria commonly used (Tabachnick & Fidell, 2007): 1. Start with an oblique option "promax" and look at the correlation between your factors (assume underlying factors are correlated) 2. If the correlation between factors is < 0.32 then use orthogonal option "varimax" default for R For more information : Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Upper Saddle River, NJ: Pearson Allyn & Bacon, pg 646-647.

Popular rotation methods Orthogonal (rotates factors 90, assumes uncorrelated factors) Varimax - minimizes the number of variables that have high loadings on each factor and works to make small loadings even smaller Quartimax - involves the minimization of the number of factors needed to explain each variable Oblique (rotates factors > or < 90, assumes correlated factors) Promax - involves raising the loadings to a power of four which ultimately results in greater correlations among the factors and achieves a simple structure. Preferred methods because its speed with larger datasets Direct Oblimin - attempts to simplify the structure and the mathematics of the output Oblique rotation is more complex than orthogonal rotation, since it can involve one of two coordinate systems: a system of primary axes or a system of reference axes Most commonly used are Varimax and Promax For more information : Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Upper Saddle River, NJ: Pearson Allyn & Bacon, pg 646-647.

Exploratory Factor Analysis in R Tabachnick & Fidell Criterion: First we want to look at the correlation between factors derived from the analysis 1. Start with the assumption factors are correlated so we have to use the oblique rotation option "promax" 2. If correlations < 0.32 then the equation solutions remains nearly orthogonal, and we should use the orthogonal rotation option "varimax" If your data is correlated you should report the correlation value with your results

Exploratory Factor Analysis in R Uniqueness: tells you if there is a variable that is significantly different than the rest Factor loadings: tells you the relationship between the calculated factors and the original variables Variance explained: how much of data variance is explained by each factor P-value: tests the hypothesis The number of factors included in this analysis are sufficient enough to capture the underlying unobservable relationships Large P-value = do not reject null hypothesis i.e. YES the number of factors in the analysis are sufficient

Exploratory Factor Analysis in R Typically weak relationships between factors original variables are generally discarded from the analysis In the Master Grades example: Debate within the literature to what the cutoff for strong relationship should be Stringent cutoffs are often not realistic for environmental data Common Rule of Thumb: A relationship is strong if the absolute value > 0.4 But, this should be evaluated for your data as to what logically makes sense Factor 1 has a strong relationship with grades in Ecology, Policy, and Geography Factor 2 has a strong relationship with grades in Statistics and a moderate relationship with Experimental Design Using rationale and the commonalities between the class subjects we could infer that Factor 1 is likely related to writing ability and Factor 2 is likely related to mathematical ability

PCA vs EFA output Renewable Resources MSc Example: PCA EFA Rotations were in multi-dimensional space (i.e. 5 variables were included)