9.1 Orthogonal factor model.

Size: px

Start display at page:

Download "9.1 Orthogonal factor model."

Antonia Summers
6 years ago
Views:

1 36 Chapter 9 Factor Analysis Factor analysis may be viewed as a refinement of the principal component analysis The objective is, like the PC analysis, to describe the relevant variables in study in terms of a few underlying variables, called factors 91 Orthogonal factor model Let X = (X 1,, X p ) be the variables in population with µ 1 mean E(X) = µ = µ p The orthogonal factor model is where m p, µ = E(X), l 11 l 1m L l p1 l pm F = F 1 F m ɛ 1 and ɛ = ɛ p The model can be re-expressed as and variance var(x) = Σ = σ 11 σ 1p σ p1 σ pp X p 1 µ p 1 = L p m F m 1 + ɛ p 1, (91) is called the factor loading matrix (which is non-random), are called the factors or common factors, are called errors or specific errors X i µ i = l ij F j + ɛ i, i = 1,, p (91 ) And l ij is called the loading of X i on the factor F j The assumptions of the orthogonal model are: (1) E(F ) = 0 m 1 and var(f ) = I m (2) E(ɛ) = 0 p 1 and var(ɛ) = Ψ, a diagonal matrix with diagonal elements: ψ 1,, ψ p (3) cov(f, ɛ) = 0 m p Remark The above model assumption implies that cov(f i, F j ) = 1 if i = j 0 if i j cov(f i, ɛ j ) = 0 and cov(ɛ i, ɛ j ) = ψi if i = j 0 if i j Moreover, cov(x, F ) = cov(lf, F ) = L cov(x i, F j ) = l ij, i = 1,, p; j = 1,, m Under the orthogonal factor model, the variance matrix of X, Σ, can be written as Σ = var(x) = var(lf + ɛ) = var(lf ) + var(ɛ) = Lvar(F )L + Ψ = LL + Ψ

2 37 In particular, σ ii = var(x i ) = lij 2 + ψ i h 2 i + ψ i (92) σ ij = cov(x i, X j ) = l ik l jk, i j k=1 Here h 2 i l2 i1 + + l2 im is called communality, which is the portion of the variance of X i explained by common factors ψ i, called specific variance or uniqueness, is the portion of the variance of X i explained by specific factor, the error pertained to the i-th variable X i only In orthogonal factor model, the factors or common factors are supposed to be important underlying factors that significantly affect all variables Besides these factors, the remaining ones are those only pertained to the relevant variables Specifically, ɛ i, the error pertained to the i-th variable X i, explains the part of the variation of X i that cannot be explained by common factors or by other errors Remark The orthogonal factor model (91) is essentially different from linear regression model, although there is certain formality resemblance The key difference is the common factor F, which seemingly plays the role of covariates in linear regression model, is not observable Remark The orthogonal factor model (91) has unidentifiable L and F, up to a rotation, in the sense that X µ = LF + ɛ = L F + ɛ, where L = LT and F = T F with T being any orthonormal m m matrix Then, (F, ɛ) still satisfies the assumptions (1)-(3) 92 Estimation by the principal component approach Recall the decomposition of Σ: λ 1 Σ = eλe = (e 1 ep ) e 1 λ p e p ie, (λ k, e k ), k = 1,, p are the p eigenvalue-eigenvector pairs of Σ with λ 1 λ p > 0 And the population PCs are Y 1 Y = Y p = e (X µ) Then, Y 1 X µ = ey = (e 1 ep ) = e j Y j + Y p = λj e j (Y j / λ j ) + e j Y j = ( λ 1 e 1 λ m e m ) = LF + ɛ, say, Y 1 / λ 1 Y m / λ m + e j Y j e j Y j

3 38 where L = ( λ 1 e 1 λm e m ), F = Y 1 / λ 1 Y m / λ m and ɛ = e j Y j It can be verified that the assumptions (1) and (2) satisfied, but not necessarily (3) Nevertheless, the above derivation provides hint to using principal components as a possible solution to the orthogonal factor model The common factors are simply the first m PCs standardized by their standard deviations: F j = Y j / λ j, j = 1,, m We next build the sample analogues of the above derivation There are n observations, presented as X = (X (1) X x 11 x 1p (p) ) = x n1 x np with X (k) = (x 1k,, x nk ) denoting the n observations of the k-th variable X k Let S be the sample variance matrix, with (ˆλ k, ê k ), k = 1,, p as eigenvalue-eigenvector pairs and ˆλ 1 ˆλ p > 0 ˆλ 1 ê 1 S = êˆλê = (ê 1 êp ) ˆλ p Then, with the PC approach, the factor loading matrix L is estimated by L p m ê p l11 l1m = ( l (1) l (m) ) lp1 lpm = ( ˆλ 1 ê 1 ˆλ m ê m ) Based on the fact that σ ii = m l2 ij + ψ i and use s ii to estimate σ ii Then, Ψ is estimated by ψ 1 Ψ = ψ p where ψi = s ii l2 ij Example 91 Analysis of weekly stock return data (Example 81 continued) Suppose m = 1 ˆλ 1 = 2856 p = 5 Then, and L = 0783 ˆλ 1 ê 1 = ê 1 = h 2 i = m l 2 ij = l2 i1 ψ i = s ii h 2 i = 1 h 2 i h 2 1 = = 061 ψ1 = 1 h 2 1 = 039 h 2 5 = = 049 ψ5 = 1 h 2 5 = 051

4 39 The proportion of total variation explained by the first (and only one) factor is p l i1 2 p s ii = ˆλ 1 ˆλ ˆλ = 2856 = 571% Estimation by maximum likelihood approach Assume X 1,, X n are iid MN(µ, Σ) Then, the likelihood is lik(µ, Σ) (2π) np/2 Σ n/2 exp 1 2 n (X i µ) Σ 1 (X i µ)} Under the orthogonal factor model (91), Σ = LL + Ψ Then, the likelihood becomes lik(µ, L, Ψ) (2π) np/2 LL + Ψ n/2 exp 1 2 n (X i µ) (LL + Ψ) 1 (X i µ)} With certain restriction, the MLE of L and Ψ can be computed We denote them by ˆL and ˆΨ (Actual computation of the MLE is not required) 94 A test of the number of common factors The orthogonal factor model (91) pre-specifies m, the number of the common factors In practice, m is often unknown Here we consider, for a given m, a statistical test to test whether such an m is appropriate Presented in terms of statistical hypothesis as H0 : Σ = LL + Ψ where L is p m matrix and Ψ is diagonal H a : otherwise A generalized likelihood ratio test statistic is ( maxlik(µ, Σ) : µ, Σ} ) ( lik( X, (n 1)/nS) ) 2 log = 2 log maxlik(µ, L, Ψ) : µ, L, Ψ} lik( X, ˆL, ˆΨ) ( (n 1)/nS n/2 ) ( 2 log ˆLˆL + ˆΨ = n log ˆLˆL + ˆΨ ) n/2 (n 1)/nS With some further refinement called Bartlett correction, the appropriate signficance level α test is ( reject H 0 when [n 1 (2p + 4m + 5)/6] log ˆLˆL + ˆΨ ) > χ 2 05[(p m) (n 1)/nS 2 p m ](α) 95 Factor rotation As remarked in Section 91, the orthogonal factor model is not identifiable up to a rotation of the common factors or factor loading matrix In other words, X µ = LF + ɛ = L F + ɛ for L = LT and F = T F where T is any m m orthonormal matrix Therefore, up to a rotation, it is legitimate and often desirable to choose a pair (L, F ) so that it may achieve better interpretability A criterion called varimax criterion can be applied to find an optimal rotation Let ˆL be the p m rotated factor loading matrix with elements ˆl ij Define l ij = ˆl ij /ĥi, and V = [ 1 p (lij) 4 1 p (lij) 2 } 2],

5 40 which is the sum of the column-wise variance of the squares of scaled factor loadings Find the optimal lij such that V achieves maximum Then, the optimal rotated factor loading matrix is ˆl ij = ĥi (the optimal lij ) 96 Factor scores Let x 1,, x n be a sample of n observations of X that follows the orthogonal factor model (91) Write x i µ = Lf i + e i Then, f i and e i may be regarded as the realized but unobserved values of the common factors and the errors that produced the i-th observation x i Note that x i is p 1, f i is m 1 and e i is p 1 Factor scores refer to estimator of f j, denoted by ˆf j or f j There are two commonly used methods of estimation (i) Method 1: weighted/unweighted least squares method Notice that, if one minimize (x µ Lf) Ψ 1 (x µ Lf) over all m dimensional vector f Then, the minimizer f is (L Ψ 1 L) 1 L Ψ 1 (x µ) With this minimization, we can obtain factor scores as (1) Maximum likelihood approach: ˆf j = (ˆL ˆΨˆL) 1 ˆL ˆΨ 1 (x j x) where (ˆL, ˆΨ) are MLE of (L, Ψ) And ê j = x j x ˆL ˆf j is the estimator of e j (2) Principal component approach: f j = ( L L) 1 L (x j x) where L is the estimator of L based on the PC approach And ẽ j = x j x L f j is the estimator of e j (ii) Method 2: Regression method The motivation for this method comes from linear regression The orthogonal factor model implies ( ) ( ) ( ) X LF + ɛ LL var = var = + Ψ L F F L and E(F X = x) = 0 + L (LL + Ψ) 1 (x µ) = L Σ 1 (x µ) citing from a proposition in Chapter 4 Then f j is estimated by ˆf j = ˆL S 1 (x j x) I m 97 A general guideline To perform a complete factor analysis, some guidelines are useful The following steps are recommended 1 Perform principal component factor analysis, with care of the issue of standardization 2 Perform a maximum likelihood factor analysis 3 Compare the results of step 1 and step 2 4 Change number of common factors m and repeat steps For large set of data, split them in half, perform the above analysis on each half and compare the results

3.1. The probabilistic view of the principal component analysis.

3.1. The probabilistic view of the principal component analysis. 301 Chapter 3 Principal Components and Statistical Factor Models This chapter of introduces the principal component analysis (PCA), briefly reviews statistical factor models PCA is among the most popular