Manabu Sato* and Masaaki Ito**

Size: px

Start display at page:

Download "Manabu Sato* and Masaaki Ito**"

Joanna Powell
5 years ago
Views:

1 J. Japan Statist. Soc. Vol. 37 No THEORETICAL JUSTIFICATION OF DECISION RULES FOR THE NUMBER OF FACTORS: PRINCIPAL COMPONENT ANALYSIS AS A SUBSTITUTE FOR FACTOR ANALYSIS IN ONE-FACTOR CASES Manabu Sato* and Masaaki Ito** Applying principal component analysis as a substitute for factor analysis, we often adopt the following greater-than-one rule to decide the number of factors, k: Take the number of eigenvalues of the correlation matrix that is greater than one. Another approach to deciding k is based on the scree graph. In the present paper, the adequacy of these rules for one-factor cases is discussed. On the basis of obtained results, some recommendations for data analysis are given. Our approach to this study is based on the analytical expressions of eigenvalues under some simple but practical cases. In deriving theoretical results, we use a representation of a polynomial in terms of a remainder sequence. This technique is useful for finding the sign of polynomials under inequality constraints, so the idea is also introduced. Key words and phrases: Cubic equation, greater-than-one rule, number of factors, principal component analysis, representation of a polynomial in terms of a remainder sequence, scree test. 1. Introduction Principal component analysis (PCA) is a branch of multivariate statistical analysis concerned with the internal relationships of a set of variables. It is well known that PCA and factor analysis (FA) resemble each other but have slightly different aims (Chap. 7 of Jolliffe (2002), Chap. 14 of Anderson (2003)). However, PCA is very often used for the same purpose as FA. In fact, when PCA is applied, analysts calculate not only principal components but correlations between principal components and original variables (see, for example, Section of Chatfield and Collins (1989)). Using the correlations, which are called factor loadings after FA, they quite often try to derive a latent structure. In this way, PCA is often used as a substitute for FA. Many attempts were made to compare and contrast FA with PCA (see, for example, Bentler and Kano (1990), Schneeweiss and Mathes (1995), Schneeweiss (1997), and Ogasawara (2000)), while Jolliffe (2002, Section 7.3) summarized many results. Deciding the number of factors, k, is an important problem in FA. When Received December 12,2005. Revised June 16,2006. Accepted September 29,2006. *Center for Foundational Arts and Sciences,Faculty of Health Sciences,Hiroshima Prefectural College of Health Sciences,Japan. Current address: Graduate School of Information Sciences, Hiroshima City University,3-4-1 Ozuka-Higashi,Asa-Minami-ku,Hiroshima ,Japan. satomnb@hiroshima-cu.ac.jp **Graduate School of Engineering,Hiroshima University,1-4-1,Kagamiyama,Higashi-Hiroshima ,Japan.

2 176 MANABU SATO AND MASAAKI ITO researchers analyze a correlation matrix, they often adopt the following greaterthan-one rule: The number of factors is taken as the number of eigenvalues of the correlation matrix that is greater than one. Hence we will investigate the adequacy of this rule in the present paper. Another approach to deciding k is based on the scree graph proposed by Cattell (1966). Cattell (1966, p. 249) noticed that this scree invariably began at the k-th eigenvalue when k was the true number of factors. This approach is also widely applied (see, for example, Section of Jolliffe (2002)). Hence we also examine the adequacy of this scree test. Further, we study the properties of the greater-than-one rule and the scree test when the number p of variables increases. Now we will explain an approach, a framework, and assumptions of our investigation. (i) FA model We assume that data follow an FA model, although PCA does not require a structural model. This assumption is natural, because, when researchers want to interpret the factor loadings calculated with PCA, it is assumed implicitly that the FA model holds. Therefore, let s assume a p-dimensional observable vector x follows an FA model: x = µ +Λf + u. Here, µ = (µ 1,...,µ p ) is a mean vector, Λ is a p k (p > k) factor loading matrix of rank k, f =(f 1,...,f k ) is a common factor vector and u =(u 1,...,u p ) is an error term (see, for example, p. 6 of Lawley and Maxwell (1971)). The population variance-covariance matrix Σ of x can be written as Σ = ΛΛ + Ψ under the conditions that E{f } = 0, E{u} = 0, E{fu } = O (a null matrix), and E{uu } = Ψ (a diagonal matrix). Here, the symbol prime ( ) means a transposed vector or matrix. (ii) One-factor case We focus on one-factor cases (k = 1). When we confirm a latent structure in practical data analysis, the complete simple structure, that is, each row of Λ has only one nonzero element, is often assumed. This structure can be reduced to sets of one-factor cases (see, for example, Sato (1990)). Let λ denote a p 1 loading vector, then (1.1) Σ=λλ +Ψ. (iii) Correlation matrix Since it is common to apply a correlation matrix instead of a variancecovariance matrix in view of the property of scale invariance, we consider a correlation matrix. Putting D = diag Σ, and resetting D 1/2 λ λ, D 1/2 ΨD 1/2 Ψ, from (1.1) we can express the population correlation matrix P as (1.2) P = λλ +Ψ. (iv) Range of parameters We give parameters λ =(λ 1,...,λ p ) such that diag(i λλ ) is positive

3 DECISION RULES FOR THE NUMBER OF FACTORS 177 definite, and set parameters Ψ = diag{ψ 1,...,ψ p } = diag(i λλ ) where I is the unit matrix. We assume that 1 >λ 1 λ p > 0 and p 3 throughout this paper. The reasons of these assumptions are as follows: If λ i 1, then the corresponding unique variance ψ i =1 λ 2 i becomes nonpositive. If λ i = 0, then this loading is meaningless. If λ i < 0, by inverting the sign of x i (= µ i + λ i f 1 + u i ) and µ i, we can change the sign of λ i. By reordering λ i, we can assume without loss of generality that λ 1 λ p. In one-factor cases, if and only if p 3 under λ i 0,λ can be determined uniquely up to sign (Theorems 5.1 and 5.6 of Anderson and Rubin (1956)). Section 2 presents the expression and properties of eigenvalues of P defined by (1.2). Section 3 describes applications to principal component analysis. Section 4 examines whether some propositions guaranteed under (1.2) hold for a certain empirical data set which does not have the structure of (1.2) exactly. Section 5 gives some recommendations for data analysis and makes concluding remarks. The appendix gives proofs of some theorems, lemmas and a proposition. To complete these proofs we used a representation of a polynomial in terms of a remainder sequence and here we explain its idea. 2. Expression and properties of eigenvalues of treated matrix At first, inequalities for eigenvalues θ 1 θ p of P defined by (1.2) are introduced. It is difficult to show an explicit expression of θ i in a generic case, although the next proposition gives the upper and lower bounds for θ i. Proposition 1 (Upper and lower bounds for eigenvalues) [Theorem 5.1 of Sato (1992)]. The following inequalities for the eigenvalues θ i of P defined by (1.2) are established. 1+λ λ 2 p 1 θ 1 1+λ λ 2 p > 1 > 1 λ 2 p θ 2 1 λ 2 p 1 θ 3 θ p 1 λ 2 1. The equalities 1+λ λ2 p 1 = θ 1 =1+λ λ2 p hold if and only if λ 1 = = λ p. The equalities 1 λ 2 p+2 i = θ i =1 λ 2 p+1 i hold if and only if λ p+2 i = λ p+1 i (i =2,...,p). We focus on a three-variable case, which is the most fundamental case. We can obtain explicit expressions of eigenvalues by Cardano s or Lagrange s method for solving the associated cubic eigen-polynomial equation. The expressions are, however, unsuitable for investigating the properties of eigenvalues because they include cubic roots of complex numbers. To this end, we express the eigenvalues in terms of cosine functions to obtain their properties. Lemma 1 (Explicit expression of eigenvalues). In the case p = 3, the eigenvalues θ 1 >θ 2 θ 3 of P defined by (1.2) can be expressed as follows: θ 1 = λ 2 1 λ2 2 + λ2 1 λ2 3 + λ2 2 λ2 3 cos φ 3,

4 178 MANABU SATO AND MASAAKI ITO θ 2 =1 2 λ 2 1 λ2 2 + λ2 1 λ2 3 + λ2 2 λ2 3 cos φ + π, 3 3 θ 3 =1 2 λ 2 1 λ2 2 + λ2 1 λ2 3 + λ2 2 λ2 3 cos φ π, λ where cos φ = 2 1 λ2 2 λ2 3 and 0 φ< π (λ 2 1 λ2 2 +λ2 1 λ2 3 +λ2 2 λ2 3 )3/2 2. The equality θ 2 = θ 3 holds if and only if λ 1 = λ 2 = λ 3, or equivalently, φ =0. A proof of this lemma is given in Subsection A.2 of Appendix. When analysts investigate a latent structure of treated data in practice, they do not always examine loadings precisely; they simply check whether they are of large or small magnitude. Thus, we may treat a simple case; the loading vector λ consists of two magnitude levels, say α (large) and β (small). In this case, we can obtain the eigenvalues of P explicitly as the following lemma. Lemma 2 (Eigenvalues for simple case). In the case λ = (α,...,α, }{{} p 1 β,...,β), where 1 > α β > 0 and p = p }{{} 1 + p 2 3, the eigenvalues p 2 θ 1 >θ 2 θ p of P defined by (1.2) are given. θ 1 = ((p 1 1)α 2 +(p 2 1)β 2 + (p 1 1) 2 α 4 +(p 2 1) 2 β 4 +2(p 1 p 2 + p 1 + p 2 1)α 2 β 2 ). Further, θ 2,...,θ p are expressed as follows: (i) If p 1 =1and p 2 2, then θ 2 = = θ p 1 =1 β 2 (multiplicity p 2=p 2 1) and θ p = ((p 2 1)β 2 (p 2 1) 2 β 4 +4p 2 α 2 β 2 ). (ii) If p 1 2 and p 2 =1,then θ 2 = ((p 1 1)α 2 (p 1 1) 2 α 4 +4p 1 α 2 β 2 ) < 1 β 2 and θ 3 = = θ p =1 α 2 (multiplicity p 1 1). (iii) If p 1 2 and p 2 2, then θ 2 = = θ p2 =1 β 2 (multiplicity p 2 1), and θ p2 +1 = ((p 1 1)α 2 +(p 2 1)β 2 (p 1 1) 2 α 4 +(p 2 1) 2 β 4 +2(p 1 p 2 + p 1 + p 2 1)α 2 β 2 ) θ p2 +2 = = θ p =1 α 2 (multiplicity p p 2 1=p 1 1). A proof of this lemma is given in Subsection A.3 of Appendix.

5 DECISION RULES FOR THE NUMBER OF FACTORS Application to principal component analysis as a substitute for factor analysis We know the upper and lower bounds of θ 1 and θ 2 from Proposition 1. Further, owing to Lemma 1, we can investigate the properties of the eigenvalues precisely in a three-variable case. The next theorem shows the behavior of θ 1 on a factor loading. Theorem 1 (Partial derivatives of the largest eigenvalue). In the case p = 3, the largest eigenvalue θ 1 of P defined by (1.2) has the following property: θ 1 > 0 (i =1, 2, 3) λ i except in the case λ 1 = λ 2 = λ 3. A proof of this theorem is given in Subsection A.4 of Appendix. The next theorem shows the behavior of θ 1 θ 2, which is used with the scree test. Theorem 2 (Partial derivatives of difference between the first and second eigenvalues). In the case p = 3,the difference between the first and the second eigenvalues of P defined by (1.2), = θ 1 θ 2, has the following property: > 0 (i =1, 2, 3) λ i except in the case λ 1 = λ 2 = λ 3. A proof of this theorem is given in Subsection A.5 of Appendix. The next proposition shows the behavior of eigenvalues for a two-level loading case. We obtain stronger results for this simple case. Proposition 2 (Behavior of eigenvalues for a simple case). Assume that λ =(α,...,α,β,...,β), where 1 >α β>0 and p = p }{{}}{{} 1 + p 2 3. Letθ 1 and p 1 p 2 θ 2 be the first and the second eigenvalues of P defined by (1.2), respectively. Then, the behavior of θ 1, θ 2 and =θ 1 θ 2 is as follows: (i) θ 1 increases as α, β, p 1 or p 2 increases. (ii) θ 2 β < 0. (iii) increases as α, β, p 1 or p 2 increases. This proposition can easily be proved from Lemma 2. The next proposition shows the behavior of eigenvalues on increasing the number of variables. Proposition 3 (Behavior of eigenvalues when one loading is added). Consider a loading vector λ =(λ 1,...,λ p+1 ) obtained by adding one loading to λ,

6 180 MANABU SATO AND MASAAKI ITO where 1 >λ i > 0(i =1,...,p+1).Let P = λ λ +Ψ, where Ψ = diag{ψ 1,...,ψ p+1 } and ψ i = 1 λ 2 i (i = 1,...,p +1). Take θ 1 θ p+1 and θ 1 θ p to be the eigenvalues of P and those of P defined by (1.2), respectively. Then the inequalities between the eigenvalues of P and P θ j θ j (j =1,...,p) are established. This proposition can be proved directly by using the Sturmian separation theorem (see, for example, Section 1f.2 (vi) of Rao (1973)). Proposition 3 shows the behavior of each eigenvalue when one loading is added. In the next proposition, we deal with the difference between the first and the second eigenvalues for the following simple case: Another factor loading γ is added to the simple loading vector in which all loadings are equal to α. Proposition 4. Consider a loading vector λ =(α,...,α,γ) of (p +1) variables obtained by adding one factor loading γ to λ =(α,...,α) where 1 > α, γ > 0 and p 3. Let P = λ λ +Ψ, where Ψ = diag{ψ 1,...,ψ p+1 }, ψ i =1 α2 (i =1,...,p) and ψ p+1 =1 γ2. Let the eigenvalues of P be θ 1 θ p+1. If γ 1 (3.1) p α, then θ 1 θ 2 θ 1 θ 2 holds. A proof of this proposition is given in Subsection A.6 of Appendix. This proposition suggests the following: As a large enough loading γ is added, the difference between the first and second eigenvalues increases. As p increases, p Table 1. Minimum value of γ such that θ 1 θ 2 θ 1 θ min γ such that θ 1 θ 2 θ1 θ2 2 1 α p α =.6 α =.7 α =.8 α.645α α.661α α.671α α.707α

7 DECISION RULES FOR THE NUMBER OF FACTORS 181 the larger loading is required because the effect of the additional loading is weakened. Table 1 represents the minimum value of γ in Formula (3.1) when p and α are given. 4. Examination of Propositions 3 and 4 for a certain empirical data set In practical data analysis, since a population correlation matrix P is unknown, it is estimated with a sample correlation matrix R. Although P satisfies Formula (1.2), R does not have a decomposition such as (1.2) exactly. Hence, for a certain empirical data set, we examine whether some propositions on the basis of (1.2) shown in the present paper hold, or not. We treat the famous data set introduced by Spearman (1904), in which he originated factor analysis. The data consist of measurements on six variables and their sample size is 33. The number of factors for this set of data is known to be one. Let R (j) (j =3, 4, 5, 6) be the first j variables of the sample correlation matrix (j) and ˆθ i be the i-th eigenvalue of R (j). Table 2 shows the eigenvalues of R (j).we can see that Proposition 3 holds under the condition that the sample correlation matrix is regarded to be the population one. With regard to Proposition 4, we can recognize that the difference between the first and the second eigenvalues enlarges when the number of variables increases. In fact, (ˆθ (4) (4) (3) (3) 1 ˆθ 2 ) (ˆθ 1 ˆθ 2 )=.586 > 0, (ˆθ (5) (5) (4) (4) (6) (6) (5) (5) 1 ˆθ 2 ) (ˆθ 1 ˆθ 2 )=.313 > 0 and (ˆθ 1 ˆθ 2 ) (ˆθ 1 ˆθ 2 )=.397 > 0. Table 2. Eigenvalues of first 3 6 variables of correlation matrix for Spearman s data. Number of Eigenvalues variables Recommendations for data analysis and concluding remarks On the basis of our results, we give the following recommendations for data analysis: (i) It is desirable to increase the number of variables (test items) because the largest eigenvalue increases as the number of variables increases (Proposition 3). As a large enough loading is added, used in the scree test increases (Proposition 4). (ii) It is desirable to select large loadings, because θ 1 and increase as the loading(s) increases (Theorems 1 and 2). After a set of data to be analyzed is collected, of course, it is impossible to follow these recommendations. However, when designing a questionnaire or making up a test battery, it is desirable to carry it out.

8 182 MANABU SATO AND MASAAKI ITO Since the sample size of the data set described in Section 4 is only 33, fluctuation of its sample correlation matrix R is large, so, R does not have a decomposition such as Formula (1.2) exactly. However, some properties based on (1.2) are held for this data set. Therefore, we may expect that many sets of empirical data have such properties. Propositions for new estimators or simulation studies have been carried out for a long time (see, for example, ten Berge and Kiers (1991) and Hoyle and Duvall (2004)). In contrast, our approach in the present paper is on the basis of analytical expressions of eigenvalues under some simple but practical cases. For a three-variable case, each eigenvalue can be expressed explicitly as cosine functions (Lemma 1), and we can explore the properties of them. The results obtained are consistent with experiences of applied users. In order to derive Lemma 1 as well as Theorems 1 and 2, the representation of the concerned polynomials in terms of remainder sequences is efficient. In the present study, we have used MATHEMATICA (Wolfram (1996)), which is a computer algebra system, to perform the extremely tedious algebraic calculations. In multi-factor cases (k 2) in which we cannot reduce to sets of one-factor cases, substitution of PCA for FA is sometimes inadequate (see Section 6 of Sato (1992) and Sato and Ito (2003)). Appendix In this appendix, proofs of the theorems, lemmas and proposition in Sections 2 and 3 are shown. In some of these proofs, we need to find the sign of polynomials in the variables λ 1, λ 2 and λ 3. In the present cases, we can assume without loss of generality that λ i are written in decreasing magnitude, so, λ 1 λ 2 λ 3. In other words, λ i λ j is non-negative provided that i>j. To find the sign of a given polynomial, we use the non-negativity of λ i λ j, in which we rewrite the polynomial in terms of a remainder sequence of polynomial λ i λ j. We shall illustrate the idea of a representation of a polynomial in terms of a remainder sequence in Subsection A.1. This method is generally applicable to find the sign of polynomials under inequality constraints. A.1. Representation of a polynomial in terms of a remainder sequence Let f(x 1,x 2,...,x n ) be a real valued polynomial in the real variables x 1, x 2,...,x n, and denote f(x ). Let p 1 (x ) be a non-negative polynomial associated with a constraint; for example, if a constraint is x 1 x 2, then we adopt a non-negative polynomial p 1 (x )=x 1 x 2. Then, f(x ) can be divided by the polynomial p 1 (x ) to obtain a quotient q 1 (x ) and a remainder r 1 (x ); f(x )=p 1 (x )q 1 (x )+r 1 (x ). This process can be repeated with q 1 (x ) and a non-negative polynomial p 2 (x ) related to another constraint to obtain a quotient q 2 (x ) and a remainder r 2 (x ); q 1 (x )=p 2 (x )q 2 (x )+r 2 (x ).

9 DECISION RULES FOR THE NUMBER OF FACTORS 183 Repeating this process, we obtain the following sequence of polynomials: q 2 (x )=p 3 (x )q 3 (x )+r 3 (x ),. q m 2 (x )=p m 1 (x )q m 1 (x )+r m 1 (x ), q m 1 (x )=p m (x )q m (x )+r m (x ). Using the above sequence of polynomials, f(x ) can be written like synthetic division as follows: f(x )=(( (p m (x )q m (x )+r m (x ))p m 1 (x ) + r m 1 (x ))p m 2 (x ). + r 2 (x ))p 1 (x )+r 1 (x ). We call this expression a representation of a polynomial in terms of a remainder sequence. To show that f(x ) 0, we need only show that r i (x ) 0(i = 1, 2,...,m) and q m (x ) 0. Here we should note that the representation of the given polynomial depends on the choice of non-negative polynomials p i (x ). In the following proofs, we have to handle polynomials in three variables with total degree 12 or 16. Therefore, we can hardly perform our proofs without using computer algebra. Hence, we used the computer algebra system MATHEMATICA to choose p i (x ) such that we can easily prove f(x ) 0 and to obtain the quotients q i (x ) and the remainders r i (x ). A.2. Proof of Lemma 1 The eigenvalues θ s of P are solutions of a polynomial equation det P θi =0. In the three-variable case, using (1.2), this equation is expressed as θ 3 3θ 2 (λ 2 1λ λ 2 1λ λ 2 2λ 2 3 3)θ (2λ 2 1λ 2 2λ 2 3 λ 2 1λ 2 2 λ 2 1λ 2 3 λ 2 2λ )=0. To apply the formula for the solution of a cubic equation (see, for example, p. 10 of Beyer (1987)), we eliminate the θ 2 term. Letting x = θ 1, we have (A.1) x 3 (λ 2 1λ λ 2 1λ λ 2 2λ 2 3)x 2λ 2 1λ 2 2λ 2 3 =0. Using cosine functions, we can express the solutions x 1,x 2,x 3 of equation (A.1) as follows: x 1 = 2 λ (A.2) 2 1 λ2 2 + λ2 1 λ2 3 + λ2 2 λ2 3 cos φ 3 3, x 2 = 2 (A.3) λ 2 1 λ2 2 + λ2 1 λ2 3 + λ2 2 λ2 3 cos φ + π, 3 3 x 3 = 2 (A.4) λ 2 1 λ2 2 + λ2 1 λ2 3 + λ2 2 λ2 3 cos φ π, 3 3

10 184 MANABU SATO AND MASAAKI ITO 3 3λ where cos φ = 2 1 λ2 2 λ2 3 and 0 φ< π (λ 2 1 λ2 2 +λ2 1 λ2 3 +λ2 2 λ2 3 )3/2 2. Next, we show the inequality θ 1 >θ 2. From (A.2), (A.3), θ 1 = x and θ 2 = x 2 +1,wehave θ 1 θ 2 = x 1 x 2 =2 λ 2 1 λ2 2 + λ2 1 λ φ + π λ2 2 λ2 3 cos. 6 Since 0 φ< π 2φ+π 2, then cos 6 > 0. Thus, we have θ 1 >θ 2. We prove the inequality θ 2 θ 3. From (A.3), (A.4), θ 2 = x and θ 3 = x 3 +1,wehave θ 2 θ 3 = x 2 x 3 = 4 λ (A.5) 2 1 λ2 2 + λ2 1 λ2 3 + λ2 2 λ2 3 sin φ 3 3. Since 0 φ< π 2, then sin φ 3 0. Consequently, θ 2 θ 3, and thus we have θ 1 >θ 2 θ 3. Finally, we show that the equality θ 2 = θ 3 holds if and only if λ 1 = λ 2 = λ 3. From (A.5), the equality θ 2 = θ 3 holds if and only if φ = 0, or equivalently, 3 3λ 2 1 λ2 2 λ2 3 (λ 2 1 λ2 2 + λ2 1 λ2 3 + =1. λ2 2 λ2 3 )3/2 Putting f 1 =(λ 2 1λ λ 2 1λ λ 2 2λ 2 3) 3 27λ 4 1λ 4 2λ 4 3, we represent it as a polynomial in terms of a remainder sequence: where f 1 = (((λ 2 1 λ 2 3)q 13 + r 13 )(λ 2 2 λ 2 3)+r 12 )(λ 2 1 λ 2 2)+r 11, q 13 = λ 2 1λ λ 2 1λ 2 2λ λ λ 4 2λ λ 4 3(λ 2 1 λ 2 2), r 13 =(λ λ 2 2λ λ 4 3)(λ λ 2 3)(λ 2 2 λ 2 3), r 12 = λ 6 3(8λ λ 2 3)(λ 2 1 λ 2 3), r 11 = λ 6 2(λ λ 2 3)(λ 2 2 λ 2 3) 2. Thus f 1 0, because we assume λ 1 λ 2 λ 3 > 0 described in Section 1. We see f 1 = 0 if and only if λ 1 = λ 2 = λ 3, or equivalently, φ =0. A.3. Proof of Lemma 2 The eigen-polynomial equation det P θi = 0 yields (θ (1 α 2 )) p 1 1 (θ (1 β 2 )) p 2 1 {θ 2 (2 + (p 1 1)α 2 +(p 2 1)β 2 )θ + {1+(p 1 1)α 2 +(p 2 1)β 2 Therefore, +(1 p 1 p 2 )α 2 β 2 }} =0. θ =1 α 2 (multiplicity p 1 1), 1 β 2 (multiplicity p 2 1) and other solutions can be obtained easily by solving the quadratic equation within the braces.

11 DECISION RULES FOR THE NUMBER OF FACTORS 185 A.4. Proof of Theorem 1 Originally, the range of φ is 0 φ< π 2 from Lemma 1. However, we exclude the endpoint φ = 0, because we treat the derivative with respect to φ. Letting ν = λ 2 1λ λ 2 1λ λ 2 2λ 2 3 and ζ =3 3λ 2 1λ 2 2λ 2 3ν 3/2, we have Since θ 1 =1+ 2 φ ν cos 3 3, φ = cos 1 ζ, θ 1 = 2 { 1 ν ν 1/2 cos φ λ i 3 2 λ i 3 1 ν1/2 3 sin φ 3 = 2 3ν 1/2 } φ λ i } dφ ζ dζ λ i } dφ ζ dζ λ i for i, j, k =1, 2, 3; i j, i k, j k. { 1 2 (λ2 j + λ 2 k)2λ i cos φ ν sin φ 3 = 2 cos φ { 3 (λ 2 3ν 1/2 j + λ 2 k)λ i 1 3 ν tan φ 3 sup tan φ 0<φ<π/2 3 = 1, 3 dφ dζ = 1 = ν 3/2 1 ζ 2 < 0, ζ λ i =3 3λ i λ 2 jλ 2 k(2λ 2 jλ 2 k λ 2 i λ 2 j λ 2 i λ 2 k)ν 5/2, the inequality θ 1 λ i > 0 holds, provided that (A.6) λ 2 j + λ 2 k > λ2 j λ2 k (λ2 i λ2 j + λ2 i λ2 k 2λ2 j λ2 k ). (i) θ 1 λ 1 > 0: Formula (A.6) yields Putting λ λ 2 3 > λ2 2 λ2 3 (λ2 2 (λ2 1 λ2 3 )+λ2 3 (λ2 1 λ2 2 )). f 2 =(λ λ 2 3) 2 (ν 3 27λ 4 1λ 4 2λ 4 3) λ 4 2λ 4 3(λ 2 2(λ 2 1 λ 2 3)+λ 2 3(λ 2 1 λ 2 2)) 2, we represent it as a polynomial in terms of a remainder sequence: f 2 = (((λ 2 2 λ 2 3)q 23 + r 23 )(λ 2 1 λ 2 3)+r 22 )(λ 2 1 λ 2 2)+r 21,

12 186 MANABU SATO AND MASAAKI ITO where q 23 = λ 2 1(λ λ 6 2λ λ 4 2λ λ 2 2λ λ 8 3) +(λ λ 2 2λ λ 4 3)(λ λ 2 3) 2 (λ 2 2 λ 2 3), r 23 =2λ 2 1λ 10 3, r 22 =(λ λ 6 2λ λ 4 2λ λ 2 2λ λ 8 3)(λ λ 2 3)(λ 2 2 λ 2 3) 2, r 21 = λ 6 2(λ λ 4 2λ λ 2 2λ λ 6 3)(λ 2 2 λ 2 3) 2. Thus, f 2 0, because we assume λ 1 λ 2 λ 3 > 0 described in Section 1. From Lemma 1, the case in which f 2 = 0 occurs if and only if λ 1 = λ 2 = λ 3,or equivalently φ = 0. This contradicts 0 <φ< π 2. Consequently, f 2 > 0 holds, and thus the inequality θ 1 λ 1 > 0 is proved. (ii) θ 1 λ 2 > 0: Formula (A.6) yields λ λ 2 3 > λ2 1 λ2 3 (λ2 2 (λ2 1 + λ2 3 ) 2λ2 1 λ2 3 ). > 0 obvi- Case 1. In the case λ 2 2 (λ2 1 + λ2 3 ) 2λ2 1 λ2 3 0, the inequality θ 1 λ 2 ously holds. Case 2. In the case λ 2 2 (λ2 1 + λ2 3 ) 2λ2 1 λ2 3 > 0, we will check the sign of f 3 =(λ λ 2 3) 2 (ν 3 27λ 4 1λ 4 2λ 4 3) λ 4 1λ 4 3(λ 2 2(λ λ 2 3) 2λ 2 1λ 2 3) 2. We represent f 3 as a polynomial in terms of a remainder sequence: f 3 = (((λ 2 2 λ 2 3)q 33 + r 33 )(λ 2 1 λ 2 3)+r 32 )(λ 2 1 λ 2 2)+r 31, where q 33 = λ 6 1λ λ 6 1λ 2 2λ λ 6 1λ λ 4 1λ λ 4 1λ 4 2λ λ 4 1λ λ 2 1λ λ 2 1λ 4 2λ λ 2 1λ λ λ 4 2λ λ λ 2 2λ 2 3(10λ 2 1λ λ 2 1λ λ λ 2 2λ λ 4 3)(λ 2 2 λ 2 3), r 33 = λ 6 3(8λ λ 4 1λ λ 2 1λ λ 6 3), r 32 = λ 2 2(λ λ 2 3)((λ λ 4 2λ λ 2 2λ λ 6 3)(λ 2 2 λ 2 3)+λ 8 3)(λ 2 2 λ 2 3), r 31 = λ 6 2(λ λ 4 2λ λ 2 2λ λ 6 3)(λ 2 2 λ 2 3) 2. Thus, f 3 0, because λ 1 λ 2 λ 3 > 0. From Lemma 1, the case in which f 3 = 0 occurs if and only if λ 1 = λ 2 = λ 3, or equivalently φ = 0. This contradicts 0 <φ< π 2. Consequently, f 3 > 0 holds, and thus the inequality θ 1 λ 2 > 0is proved.

13 DECISION RULES FOR THE NUMBER OF FACTORS 187 (iii) θ 1 λ 3 > 0: Formula (A.6) yields λ λ 2 2 > λ2 1 λ2 2 (λ2 1 (λ2 2 λ2 3 )+λ2 2 (λ2 1 λ2 3 )) 0. Therefore, the inequality θ 1 λ 3 > 0 is proved. A.5. Proof of Theorem 2 From Lemma 1, we see =θ 1 θ 2 = 2 ( ν 1/2 cos φ cos φ + π ) =2ν 1/2 cos ω, 3 where ν = λ 2 1λ λ 2 1λ λ 2 2λ 2 3, cos φ =3 ( 3λ 2 1λ 2 2λ 2 3ν 3/2 0 <φ< π ), 2 ω = φ 3 + π ( ) π 6 6 <ω<π. 3 Then we have { =2λ i ν 1/2 cos ω (λ 2 j + λ 2 λ k) 2 } φ ν tan ω i 3 λ 2. i Since sup tan ω = 3, in order to examine the inequality λ i whether > 0, we check (A.7) λ 2 j + λ 2 k > 2 3 ν φ λ 2 i = 3(λ2 j λ2 k (λ2 j (λ2 i λ2 k )+λ2 k (λ2 i λ2 j ))) for i, j, k =1, 2, 3; i j, i k, j k. (i) λ 1 > 0: Formula (A.7) yields λ λ 2 3 > 3(λ2 2 λ2 3 (λ2 2 (λ2 1 λ2 3 )+λ2 3 (λ2 1 λ2 2 ))) > 0. Therefore, we must check the sign of f 4 =(λ λ 2 3) 2 (ν 3 27λ 4 1λ 4 2λ 4 3) 9(λ 2 2λ 2 3(λ 2 2(λ 2 1 λ 2 3)+λ 2 3(λ 2 1 λ 2 2))) 2. We represent f 4 as a polynomial in terms of a remainder sequence: f 4 = (((λ 2 2 λ 2 3)q 43 + r 43 )(λ 2 1 λ 2 3)+r 42 )(λ 2 1 λ 2 2)+r 41,

14 188 MANABU SATO AND MASAAKI ITO where q 43 =(λ λ 2 3) 2 ((2λ λ 2 2λ λ 4 3)(λ 2 2 λ 2 3)+4λ 6 3), r 43 =(λ λ 2 3) 5 (λ 2 1 λ 2 2), r 42 =(λ λ 2 3)(λ λ 6 2λ λ 4 2λ λ 2 2λ λ 8 3)(λ 2 2 λ 2 3) 2, r 41 = λ 6 2(λ λ 4 2λ λ 2 2λ λ 6 3)(λ 2 2 λ 2 3) 2. Thus, f 4 0, because we assume λ 1 λ 2 λ 3 > 0 described in Section 1. From Lemma 1, the case in which f 4 = 0 occurs if and only if λ 1 = λ 2 = λ 3,or equivalently φ = 0. This contradicts 0 <φ< π 2. Consequently, f 4 > 0 holds, and thus the inequality λ 1 > 0 is proved. (ii) λ 2 > 0: Formula (A.7) yields (A.8) λ λ 2 3 > 3(λ2 1 λ2 3 (λ2 2 (λ2 1 + λ2 3 ) 2λ2 1 λ2 3 )). Case 1. If λ 2 1 λ2 3 (λ2 2 (λ2 1 +λ2 3 ) 2λ2 1 λ2 3 ) 0, inequality (A.8) obviously holds, because the left hand side is positive. Case 2. If λ 2 1 λ2 3 (λ2 2 (λ2 1 + λ2 3 ) 2λ2 1 λ2 3 ) > 0, we check the sign of f 5 =(λ λ 2 3) 2 (ν 3 27λ 4 1λ 4 2λ 4 3) 9(λ 2 1λ 2 3(λ 2 2(λ λ 2 3) 2λ 2 1λ 2 3)) 2. We represent f 5 as a polynomial in terms of a remainder sequence: f 5 = (((λ 2 2 λ 2 3)q 53 + r 53 )(λ 2 1 λ 2 3)+r 52 )(λ 2 1 λ 2 2)+r 51, where q 53 = λ 6 1λ λ 6 1λ 2 2λ λ 6 1λ λ 4 1λ λ 4 1λ 4 2λ λ 4 1λ λ 2 1λ λ 2 1λ λ λ 8 2λ λ 6 2λ λ 4 2λ λ λ 2 1λ 2 2λ 2 3(4λ λ λ 2 3)(λ 2 2 λ 2 3)+31λ 8 3(λ 2 1 λ 2 2), r 53 = λ 6 3(8λ λ 4 1λ λ 2 1λ λ 6 3), r 52 =((λ λ 8 2λ λ 6 2λ λ 4 2λ λ 2 2λ λ 8 3(λ 2 2 λ 2 3))(λ 2 2 λ 2 3) +27λ 12 3 )(λ 2 2 λ 2 3), r 51 = λ 6 2(λ λ 4 2λ λ 2 2λ λ 6 3)(λ 2 2 λ 2 3) 2. Thus, f 5 0, because λ 1 λ 2 λ 3 > 0. From Lemma 1, the case in which f 5 = 0 occurs if and only if λ 1 = λ 2 = λ 3, or equivalently φ = 0. This contradicts 0 <φ< π 2. Consequently, f 5 > 0 holds, and thus the inequality λ 2 > 0is proved.

15 DECISION RULES FOR THE NUMBER OF FACTORS 189 (iii) λ 3 > 0: Formula (A.7) yields λ λ 2 2 > 3(λ2 1 λ2 2 (λ2 1 (λ2 2 λ2 3 )+λ2 2 (λ2 1 λ2 3 ))). We see this inequality holds, because the right hand side is less than or equal to 0. Thus, the inequality λ 3 > 0 is proved. A.6. Proof of Proposition 4 Since θ 1 =1+(p 1)α 2 and θ 2 =1 α 2, then θ 1 θ 2 = pα 2. (i) Case in which γ α: From Case (ii) of Lemma 2, we obtain θ 1 =1+1 2 ((p 1)α2 + (p 1) 2 α 4 +4pα 2 γ 2 ), θ 2 =1+1 2 ((p 1)α2 (p 1) 2 α 4 +4pα 2 γ 2 ), θ 1 θ 2 = α (p 1) 2 α 2 +4pγ 2. Therefore, θ 1 θ 2 >θ 1 θ 2 holds if and only if (p 1) 2 α 2 +4pγ 2 >p 2 α 2. Then we obtain γ> p α. (ii) Case in which γ>α: From Case (i) of Lemma 2, we have θ 1 θ 2 = 1 2 ((p +1)α2 + α (p 1) 2 α 2 +4pγ 2 ). Evaluating γ by α, wehave θ 1 θ 2 > 1 2 ((p +1)α2 + α (p 1) 2 α 2 +4pα 2 )=(p +1)α 2. Hence, we obtain (θ 1 θ 2 ) (θ 1 θ 2 ) >α 2 > 0. Combining Cases (i) and (ii), we obtain the conclusion. Acknowledgements The authors would like to thank Dr. Yasuhiro Ohta of Hiroshima University for his useful advice. They are grateful to anonymous reviewers whose comments and suggestions greatly improved the presentation of the paper. This research was partially supported by the Japan Society for the Promotion of Science, Grantin-Aid for Scientific Research under contract numbers and

16 190 MANABU SATO AND MASAAKI ITO References Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed., John Wiley & Sons. Anderson, T. W. and Rubin, H. (1956). Statistical inference in factor analysis, Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability (ed. J. Neyman), 5, pp Bentler, P. M. and Kano, Y. (1990). On the equivalence of factors and components, Multivariate Behavioral Research, 25, ten Berge, J. M. F. and Kiers, H. A. L. (1991). A numerical approach to the approximate and the exact minimum rank of a covariance matrix, Psychometrika, 56, Beyer, W. H. (1987). CRC Standard Mathematical Tables, 28th ed., CRC Press. Cattell, R. B. (1966). The scree test for the number of factors, Multivariate Behavioral Research, 1, Chatfield, C. and Collins, A. J. (1989). Introduction to Multivariate Analysis, Chapman and Hall. Hoyle, R. H. and Duvall, J. L. (2004). Determining the number of factors in exploratory and confirmatory factor analysis, The Sage Handbook of Quantitative Methodology for the Social Sciences (ed. D. Kaplan), pp , Sage Publications. Jolliffe, I. T. (2002). Principal Component Analysis, 2nd ed., Springer-Verlag. Lawley, D. N. and Maxwell, A. E. (1971). Factor Analysis as a Statistical Method, 2nd ed., Butterworth. Ogasawara, H. (2000). Some relationships between factors and components, Psychometrika, 65, Rao, C. R. (1973). Linear Statistical Inference and its Applications, 2nd ed., John Wiley & Sons. Sato, M. (1990). Some remarks on principal component analysis as a substitute for factor analysis in mono-factor cases, J. Japan Statist. Soc., 20, Sato, M. (1992). A study of an identification problem and substitute use of principal component analysis in factor analysis, Hiroshima Mathematical Journal, 22, Sato, M. and Ito, M. (2003). Some cautionary notes on factor loadings estimated by principal component analysis, New Developments in Psychometrics (eds. H. Yanai et al.), pp , Springer-Verlag. Schneeweiss, H. (1997). Factors and principal components in the near spherical case, Multivariate Behavioral Research, 32, Schneeweiss, H. and Mathes, H. (1995). Factor analysis and principal component analysis, J. Multivariate Anal., 55, Spearman, C. (1904). General Intelligence, objectively determined and measured, American Journal of Psychology, 15, Wolfram, S. (1996). The Mathematica Book, 3rd ed., Wolfram Media/Cambridge University Press.

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.