TAMS39 Lecture 10 Principal Component Analysis Factor Analysis

Similar documents
Chapter 4: Factor Analysis

9.1 Orthogonal factor model.

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

STAT 730 Chapter 9: Factor analysis

3.1. The probabilistic view of the principal component analysis.

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II

TAMS39 Lecture 2 Multivariate normal distribution

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam

Introduction to Factor Analysis

Factor Analysis and Kalman Filtering (11/2/04)

Intermediate Social Statistics

STATISTICAL LEARNING SYSTEMS

Capítulo 12 FACTOR ANALYSIS 12.1 INTRODUCTION

Lecture 3. Inference about multivariate normal distribution

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Factor Analysis Edpsy/Soc 584 & Psych 594

Multivariate Analysis and Likelihood Inference

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Factor analysis. George Balabanis

Dimension Reduction and Classification Using PCA and Factor. Overview

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Multivariate Statistics

Introduction to Factor Analysis

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

R = µ + Bf Arbitrage Pricing Model, APM

STA 437: Applied Multivariate Statistics

Methods for sparse analysis of high-dimensional data, II

Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R,

Methods for sparse analysis of high-dimensional data, II

Principal component analysis

Principal Components Analysis

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Department of Statistics

Institute of Actuaries of India

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Ross (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM.

Principal Component Analysis (PCA) Principal Component Analysis (PCA)

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principal Component Analysis (PCA) Theory, Practice, and Examples

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

A Very Brief Summary of Statistical Inference, and Examples

Applied Multivariate Analysis

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test

Multivariate Regression

STAT 100C: Linear models

Machine Learning 2nd Edition

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Dimensionality Reduction Techniques (DRT)

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

STAT 100C: Linear models

Statistical Inference On the High-dimensional Gaussian Covarianc

Factor Analysis (10/2/13)

Factor Analysis Continued. Psy 524 Ainsworth

6-1. Canonical Correlation Analysis

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Lecture 15. Hypothesis testing in the linear model

simple if it completely specifies the density of x

PCA and admixture models

Machine Learning (BSMC-GA 4439) Wenke Liu

Second-Order Inference for Gaussian Random Curves

Multiple Linear Regression

Lecture 6: April 19, 2002

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Stat 206, Week 6: Factor Analysis

z = β βσβ Statistical Analysis of MV Data Example : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) test statistic for H 0β is

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Introduction to Machine Learning

Eigenvalues, Eigenvectors, and an Intro to PCA

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Eigenvalues, Eigenvectors, and an Intro to PCA

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

Data Mining and Analysis: Fundamental Concepts and Algorithms

The Common Factor Model. Measurement Methods Lecture 15 Chapter 9

Multivariate Regression (Chapter 10)

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Mathematical foundations - linear algebra

UNIVERSITY OF TORONTO Faculty of Arts and Science

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization

[y i α βx i ] 2 (2) Q = i=1

The outline for Unit 3

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Comparing two independent samples

Chapter 7, continued: MANOVA

Math 108b: Notes on the Spectral Theorem

Review. December 4 th, Review

Principal Component Analysis & Factor Analysis. Psych 818 DeShon

Asymptotic Statistics-VI. Changliang Zou

Experimental design. Matti Hotokka Department of Physical Chemistry Åbo Akademi University

ELEG 5633 Detection and Estimation Signal Detection: Deterministic Signals

Part 6: Multivariate Normal and Linear Models

Lecture 5: Hypothesis tests for more than one sample

7 Principal Component Analysis

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay

STAT 730 Chapter 5: Hypothesis Testing

Transcription:

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden

Content - Lecture Principal component analysis (PCA) Factor analysis (FA) TAMS39 Lecture 10 1/36

Principal component analysis (PCA) A principal component analysis (PCA) is concerned with explaining the variance-covariance structure of a set of variables through a few linear combinations of these variables. Its general objectives are data reduction, and interpretation. Although p components are required to reproduce the total system variability, often much of this variability can be accounted for by a small number k of the principle components. TAMS39 Lecture 10 2/36

Let x = (x 1,..., x p ) be a random vector with the covariance Σ. Algebraically, the principal components are particular linear combinations of the p random variables. Geometrically, these linear combinations represent the selection of a new coordinate system obtained by rotating the original system with x 1,..., x p as the coordinate axes. TAMS39 Lecture 10 3/36

Consider the linear combinations y 1 = a 1x,. y p = a px, with the variances and covariances var(y i ) = a iσa i, cov(y i, y k ) = a iσa k, i = 1,...p, i, k = 1,...p. The principal components are those uncorrelated linear combinations y 1,..., y p whose variance above are as large as possible. TAMS39 Lecture 10 4/36

First principal component = linear combination y 1 = a 1 x that maximize var(a 1 x) subject to a 1 a 1 = 1.. ith principal component = linear combination y i = a i x that maximize var(a i x) subject to a i a i = 1 and cov(a i x, a kx) = 0 for k < i, i = 1,..., p. TAMS39 Lecture 10 5/36

Let λ 1,..., λ p > 0 be the eigevalue of the matrix Σ and let H = (h 1,..., h p ) be an m m orthogonal matrix such that H ΣH = diag(λ 1,..., λ p ) = Λ, so that h i is a eigenvector of Σ corresponding to the eigenvalue λ i. We now have that the covariance between any linear combination a x and a linear combination based on a eigenvector h ix is given by cov(a x, h ix) = a Σh i = λ i a h i. Hence, cov(a x, h ix) = 0 is the same as a and h i to be orthogonal. TAMS39 Lecture 10 6/36

Theorem For k = 1,..., p λ k = max a a=1, a h i =0, i=1,...,k 1 a Σa = h kσh k. TAMS39 Lecture 10 7/36

TAMS39 Lecture 10 8/36

Measures of total variation Note that in transforming to principal components the measure trσ and Σ of total variations are unchanged, for p trσ = trh ΣH = trλ = λ i, Σ = H ΣH = Λ = i=1 p λ i. Note also that k i=1 λ i is the variance of the first k principal components. In principal component analysis the hope is that for some small k, this variance is close to trσ, i.e., the first k principle components explain most of the variation in x, and the remaining p k principal components contribute little. i=1 TAMS39 Lecture 10 9/36

Sample principle component analysis Assume that x 1,..., x n are iid N p (µ, Σ), i.e., X = (x 1,..., x n ) N p,n (µ1, Σ, I ). The MLE of Σ is n 1 n S where S is the sample covariance matrix given by S = 1 n 1 X ( I n 1 n (1 n1 n ) 1 1 ) n X. The MLE of the λ i s, the ordered (assumed distinct) eigenvalues of Σ, are n 1 n ˆλ i, where ˆλ i s are the ordered eigenvalues of S. The ˆλ i s are distinct with probability one, since n > p. TAMS39 Lecture 10 10/36

The sample principle components are defined as First sample principal component = linear combination ŷ 1 = a 1 x that maximize the sample variance a 1 Sa 1 subject to a 1 a 1 = 1.. ith smaple principal component = linear combination ŷ i = a i x that maximize the sample variance a i Sa i subject to a i a i = 1 and a i Sa k = 0 for k < i, i = 1,..., p. TAMS39 Lecture 10 11/36

We now have a similar theorem as above. Theorem For k = 1,..., p ˆλ k = max a Sa = a a=1, a ĥ ĥ ksĥk, i =0, i=1,...,k 1 where ĥk is a eigenvector of the MLE n 1 n S of Σ corresponding to the eigenvalue ˆλ k. TAMS39 Lecture 10 12/36

Asymptotic distributions Assume that x 1,..., x n are iid N p (µ, Σ). Assume also that the eigenvalues of Σ are distinct and positive, so that 0 < λ p <... < λ 1. The sampling distribution of the MLEs ˆλ i and ĥi are difficult to derive and beyond the scope of this course. We shall simply summarize some results. TAMS39 Lecture 10 13/36

Asymptotic mean and variance Tractable expressions for the exact moments of the eigenvalues of S are unknown, but asymptotic expressions for some of these have been found by Lawley (1956). Lawley has shown that if λ i is a distinct eigenvalue of Σ the mean and variance of ˆλ i can be expand for large n as E(ˆλ i ) = λ i + λ i n p j=1,j i λ j λ i λ j + O(n 2 ) and var(ˆλ i ) = 2λ2 i n 1 1 n p j=1,j i ( λj ) 2 + O(n 3 ). λ i λ j TAMS39 Lecture 10 14/36

Asymptotic distribution, cont. Let and then we have E i = λ i λ = (λ 1,..., λ p ) p k=1,k i λ k (λ k λ i ) 2 h kh k, 1. n(ˆλ λ) N p (0, 2Λ 2 ), where Λ = diag(λ 1,..., λ p ), i.e., n(ˆλi λ i ) 2λi N(0, 1) for i = 1,..., p, 2. n(ĥi h i ) N p (0, E i ), 3. ˆλ i and ĥi is independently distributed for i = 1,..., p. TAMS39 Lecture 10 15/36

Result 1 implies that, for large n, the ˆλ i are independently distributed. Using result 1 one can also construct confidence interval for the λ i s. Result 2 implies that the elements of each ĥi are correlated, and the correlation depends to a large extent on the separation of the eigenvalues λ 1,..., λ p (which are unknown) and the sample size n. TAMS39 Lecture 10 16/36

H : λ k+1 = λ k+2 =... = λ p = λ Suppose we want to test the hypothesis that the last p k eigenvalues of the covariance matrix Σ are equal to λ. That is, we want to test H : λ k+1 = λ k+2 =... = λ p = λ vs. A H, where λ is unknown. This is the so-called isotropy test for the eigenvalues. Typically one conducts a series of isotropy tests starting with k = p 2 and increasing k until the null hypothesis is accepted. TAMS39 Lecture 10 17/36

PCA LRT The LRT (based in normality) for the hypothesis H is based in the statistic Q = p ˆλ j=k+1 j ( 1 p ˆλ ) p k, p k j=k+1 j where ˆλ 1 ˆλ 2... ˆλ p are the eigenvalues of the sample covariance matrix S based on f = n 1 degrees of freedom. It has been shown by Lawley (1956) that Q = ( f k 1 6 ( 2(p k) + 1 + 2 p k χ 2 ( 1 2 (p k)(p k + 1) 1 ). )) ln Q Hence, H is rejected if Q > χ 2 1 α ( 1 2 (p k)(p k + 1) 1). TAMS39 Lecture 10 18/36

Example PCA The weekly rates of return for five stocks (JP Morgan, Citibank, Wells faro, Royal Dutch Shell, and ExxonMobil) listed in the NY Stock Exchange were determined for the period january 2004 through December 2005. The weekly rates of return redefined as current week closing price - previous week closing price, previous week closing price adjusted for stock splits and dividends. The observations in 103 successive weeks appear to be independently distributed, but the rates of return across stocks are correlated, because as on expects, stocks tend to move together in response to general economics conditions. TAMS39 Lecture 10 19/36

Let x 1,..., x 5 denote observed weekly rates of return for the stocks given above. Then x = (0.0011, 0.0007, 0.0016, 0.0040, 0.0040) 1 0.632 0.511 0.115 0.155 1 0.574 0.322 0.213 R = 1 0.183 0.146 1 0.683, 1 and where R is the sample correlation matrix R = D 1/2 D S = diag(s 11,..., s 55 ). S SD 1/2 S, with We note that R is the sample covariance matrix for the standardize observations z i = x i x i sii, i = 1,..., 5. TAMS39 Lecture 10 20/36

The eigenvalues and corresponding normalized eigenvectors of R are ˆλ 1 = 2.437, ĥ 1 = (0.469, 0.532, 0.465, 0.387, 0.361), ˆλ 2 = 1.407, ĥ 2 = ( 0.368, 0.236, 0.315, 0.585, 0.606), ˆλ 3 = 0.501, ĥ 3 = ( 0.604, 0.136, 0.772, 0.093, 0.109), ˆλ 4 = 0.400, ĥ 4 = (0.363, 0.629, 0.289, 0.381, 0.493), ˆλ 5 = 0.255, ĥ 5 = (0.384, 0.496, 0.071, 0.595, 0.498). TAMS39 Lecture 10 21/36

Using the standardize variables, we obtain the first two sample principle components: ŷ 1 = ĥ 1z = 0.469z 1 + 0.532z 2 + 0.465z 3 + 0.387z 4 + 0.361z 5, ŷ 2 = ĥ 2z = 0.368z 1 0.236z 2 0.315z 3 + 0.585z 4 + 0.606z 5, and these components, which account for ˆλ 1 + ˆλ 2 p = 2.437 + 1.407 5 = 0.77, i.e., 77% of the total (standardize) sample variance, have interesting interpretations. TAMS39 Lecture 10 22/36

The first component is a roughly equally weighted sum, or index, of the five stocks. This component might be called a general stock-market component, or, simply, a market component. The second component represent a contrast between banking stocks and the oil stocks. It might be called an industry component. Thus, we see that most of the variation in these stock return is due to market activity and uncorrelated industry activity. TAMS39 Lecture 10 23/36

Factor analysis Closely related to PCA is factor analysis. Factor analysis is a statistical method used to study the dimensionality of a set of variables. In factor analysis, latent variables represent unobserved constructs and are referred to as factors or dimensions. The essential purpose of factor analysis is to describe, if possible, the covariance relationships among many variables in terms of a few underlying, but unobservable, random quantities called factors. TAMS39 Lecture 10 24/36

Factor analysis Example Suppose we wish to judge the abilities of high school students entering university. We may give them a test of 50 questions. These 50 questions, however, may fall into a few categories, such as reading comprehension, mathematics, and arts. Here we only have three factors. TAMS39 Lecture 10 25/36

The score of any randomly selected high school student on the ith question, denoted by y i, can be modeled in the form y i = µ i + λ i1 f 1 + λ i2 f 2 + λ i3 f 3 + ε i, i = 1,..., 50, where µ i is the mean for y i. Without loss of generality we can assume that f j iid N(0, 1) for j = 1, 2, 3, and independently distributed of the errors ε i iid N(0, ψ i ) for i = 1,..., 50. TAMS39 Lecture 10 26/36

Factor analysis Model In matrix notation, the model with k factors and p characteristics of a subject can be written as y = µ + Λf + ε, where y = (y 1,..., y p ), µ = (µ 1,..., µ p ), and Ψ = diag(ψ 1,..., ψ p ). ε = (ε 1,..., ε p ) N p (0, Ψ), f = (f 1,..., f k ) N k (0, I k ), λ 11... λ 1k Λ =.. : p k, λ p1... λ pk TAMS39 Lecture 10 27/36

Factor analysis Covariance matrix Since, cov(f ) = I k, cov(ε) = Ψ and cov(f, ε) = 0, it follows that the covariance of y is given by cov(y) = Λ cov(f )Λ + cov(ε) = ΛΛ + Ψ Σ. Note that the value of Σ is unchanged if Λ is post-multiplied by any k k orthogonal matrix. Hence, there is no unique choice of Λ. TAMS39 Lecture 10 28/36

Factor analysis Number of factors The number of parameters that need to be estimated is pk for Λ and p for the diagonal elements of Ψ, totalling to p(k + 1). The number of quantities available for estimating is p(p + 1)/2 elements in the sample covariance matrix S. Thus in principle the number of factors that can be selected to represent the data should be less than or equal to (p 1)/2. We have noted that the value of Σ is unchanged if Λ is postmultiplied by any k k orthogonal matrix. Thus, the effective number of parameters is not p(k + 1), but p(k + 1) k(k 1)/2. Hence, p(k + 1) k(k 1) 2 p(p + 1) 2 k 1 2 ((2p + 1) 8p + 1). TAMS39 Lecture 10 29/36

Factor analysis Uniqueness It should be mentioned, however, that if k satisfies the inequality, it does not necessary imply that a solution exist, let alone uniqueness. The inequality above has been arrived by requiring that the number of unknowns should be less than or equal to the number of equations available. To get a unique solution, we not only require that k satisfies the inequality but also that the k k matrix Λ Ψ 1 Λ is a diagonal matrix with diagonal elements that are ordered from largest to smallest (see Lawley and Maxwell, 1970). TAMS39 Lecture 10 30/36

Factor analysis MLE The estimate cannot be obtained explicitly, and iterative methods have to be used. To obtain the unique ML solution, it follows that when factor analysis is carried out on the correlation matrix R = D 1/2 S SD 1/2 S, where D S = diag(s 11,..., s pp ), we need to solve the equations k ˆλ 2 ij + ˆψ i = 1, i = 1,..., p, j=1 ( ) ( Ψ 1/2 R Ψ 1/2 Λ) Ψ 1/2 = Ψ 1/2 Λ D, where D is the diagonal matrix D = I + D and D is the diagonal matrix D = Λ Ψ 1 Λ (uniqueness condition). TAMS39 Lecture 10 31/36

Factor analysis Choosing the number of factors In factor analysis, we seek a diagonal matrix, Ψ with positive diagonal elements such that Σ Ψ is a positive semidefinite matrix of rank k. It can be shown that such a k will always be larger than the number of eigenvalues of the population correlation matrix that are greater than one (see Guttman, 1954). Since the population correlation matrix can be estimated by the sample correlation matrix, R, a rule of thumb often used in statistical packages chooses k to be the number of eigenvalues of R greater than 1. This choice of k can be used as an initial guess for the number of factors. TAMS39 Lecture 10 32/36

With this choice of k, we may test its adequacy by testing the hypothesis H : Σ = ΛΛ + Ψ vs. A : Σ ΛΛ + Ψ, where Λ is a p k matrix. The MLE of Σ under the alternative is S and under H with k factors is given by Σ k = Λ k Λ k + Ψ, where Λ k and Ψ are the MLEs for k factors. TAMS39 Lecture 10 33/36

Factor analysis LRT One can show that an asymptotic test statistic, based on the LRT, is given by ( ) S 2 ln χ 2 (g), Σ where g = 1 2 ((p k)2 (p + k)). However, Bartlett (1954) suggested replacing 2 in the above expression by n (2p + 4k + 11)/6 to get a better approximation. This factor is known as Bartlett s correction. The hypothesis H is rejected if ( n ) 2p + 4k + 11 ln 6 ( ) S > χ 2 1 α(g). Σ TAMS39 Lecture 10 34/36

Example Factor analysis The stock-price dat above is analyzed assuming an k = 2 factor model and using the ML method. Maximum likelihood Estimated factor loadings Specific variances Variable ˆλ1 ˆλ2 ˆψi (= 1 ĥ2 i ) 1. JP Morgan 0.115 0.755 0.42 2. Citibank 0.322 0.788 0.27 3. Wells Fargo 0.182 0.652 0.54 4. Royal Dutch Shell 1.000-0.000 0.00 5. Texaco 0.683-0.032 0.53 TAMS39 Lecture 10 35/36

TAMS39 Lecture 10 36/36

Linköping University - Research that makes a difference