Multivariate Statistics - PDF Free Download

Multivariate Statistics Chapter 4: Factor analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 1 / 61

1 Introduction 2 The factor model 3 Estimation of the loading matrix and the factor scores Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 2 / 61

Introduction Sometimes, it is not always possible to measure the quantities of interest directly. For instance, in Psychology, intelligence is a prime example. Usually, scores in subjects such as Mathematics, Language and Literature or comprehensive tests are used to describe a child s intelligence. One may wonder whether is possible to define a certain score to measure the person s intelligence from these measurements. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 3 / 61

Introduction Factor analysis (FA) is a statistical technique that recognizes that there is an association between certain measured quantities and some hidden quantities. More precisely, the main aims of FA is: 1 to exhibit the relationship between the measured and the underlying variables; and 2 to estimate the underlying variables, usually called hidden or latent variables. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 4 / 61

Introduction The model-based nature of FA has invited, and resulted in, many theoretical and statistical advances. Particularly, the factor model allows an elegant description of: 1 the underlying structure among the observable variables; 2 the hidden variables; and 3 the relationship between both type of variables. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 5 / 61

Introduction FA applies to data with fewer hidden variables than measured quantities, but we do not need to know explicitly the number of hidden variables. It is not possible to determine the hidden variables uniquely, and typically, we focus on expressing the observed variables in terms of a smaller number of uncorrelated factors. There is a strong connection with Principal Component Analysis, and we shall see that a principal component decomposition leads to one possible FA solution. However, there are important distinctions between PCA and FA. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 6 / 61

Introduction The rest of this chapter is devoted: to present the factor model and its main characteristics; and to see how to estimate the parameters of the model and the hidden factors. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 7 / 61

The factor model Let x = (x 1,..., x p ) be a multivariate random variable with mean vector µ x and covariance matrix Σ x. Note that we do not assume any distribution of the random variable x. However, one of the estimation methods presented later assumes Gaussianity of x. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 8 / 61

The factor model The factor model establishes that x can be written as follows: where: 1 µ x = E [x] is a p-dimensional vector. x = µ x + Lf + ɛ 2 L is a p r matrix of unknown constants called the loading matrix. 3 f is a r-dimensional random vector of latent variables, where r < p, with mean vector 0 r and covariance matrix I r (the factors). 4 ɛ is a p-dimensional unobserved vector of errors with mean vector 0 r and diagonal covariance matrix Σ ɛ. 5 It is assumed that the perturbations are uncorrelated with the factors, i.e., Cov [f, ɛ] = E [f ɛ ] = 0 r p and Cov [ɛ, f ] = E [ɛf ] = 0 p r. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 9 / 61

The factor model As in PCA, the driving force of the factor model is Σ x. From the factor model, it is not difficult to see that: Σ x = Cov [x] = E [ (x µ x ) (x µ x ) ] = LL + Σ ɛ Therefore, Σ x can be written in terms of L and Σ ɛ. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 10 / 61

The factor model Consequently, the variances of the variables in x are given by: σx,j 2 = L j L j + σɛ,j 2 = τ jj + σɛ,j 2 for j = 1,..., p, where: 1 τ jj is called the j-th communality; and 2 σɛ,j 2 is the variance of the j-th element of ɛ and is called the j-th uniqueness. Additionally, the covariances of the variables in x are given by: for j, k = 1,..., p with j k. σ x,jk = L j L k = τ jk Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 11 / 61

The factor model The loading matrix, L, has a simple interpretation. Note that: Cov [x, f ] = E [(x µ x ) f ] = E [(Lf + ɛ) f ] = L Thus, L is the covariance between x (the multivariate random variable of interest) and f (the latent factors). Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 12 / 61

The factor model In the factor model, neither the loading matrix, L, nor the factors, f, are observable. This poses the problem of indeterminacy: if H is a r r orthogonal matrix, then, the factor model can be written as: x = µ x + Lf + ɛ = µ x + LHH f + ɛ = µ x + L f + ɛ where L = LH and f = H f. Both models contain uncorrelated factors, with an identity covariance matrix. As a consequence, at best, the loading matrix and the factors are unique up to an orthogonal transformation because an orthogonal transformation of the factors leads to another factors, and similar relationships hold for the loading matrix. Indeed, we cannot uniquely recover the loading matrix or the factors from knowledge of the covariance matrix Σ x, as the next example illustrates. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 13 / 61

The factor model Assume a one factor model for a bivariate random variable x = (x 1, x 2 ) with mean 0 2 and covariance matrix: ( ) 1.25 0.5 Σ x = 0.5 0.5 The one-factor model is: x = so that, f is a univariate random variable. ( ) ( ) ( ) x1 L11 ɛ1 = f + x 2 L 21 ɛ 2 Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 14 / 61

The factor model Therefore, Σ x can be written as follows: ( ) ( ) ( 1.25 0.5 L11 ( ) σ 2 Σ x = = L11 L 0.5 0.5 L 21 + ɛ,11 0 21 0 σɛ,22 2 ( ) ( ) L 2 = 11 L 11 L 21 σ 2 L 11 L 21 L 2 + ɛ,11 0 21 0 σɛ,22 2 ) = Consequently, 1.25 = L 2 11 + σɛ,11 2 0.5 = L 11 L 21 0.5 = L 2 21 + σɛ,22 2 A solution for L is L 11 = 1 and L 21 = 0.5. Another option is L 11 = 0.75 and L 21 = 0.66. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 15 / 61

The factor model Unless other information is available, it is not clear which solution one should pick. From the intuitive point of view, we might prefer the solution which has the smallest covariance matrix Σ ɛ, as measured by the trace of Σ ɛ. In this case, the solution with L 11 = 1 and L 21 = 0.5 would be preferable. The varimax criterion is a method for distinguishing between loading matrices which is easy to calculate and interpret. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 16 / 61

The factor model Let, VC (L) = r 1 p k=1 p L 4 jk 1 p j=1 p j=1 L 2 jk 2 The quantity VC (L) reminds us of a sample variance, with the difference that it is applied to the squared entries L 2 jk of L. Starting with a loading matrix L, we can consider rotated loading matrices L = LH, where H is an orthogonal r r matrix. The varimax criterion selects that orthogonal r r matrix which: H = arg maxvc (LH) H that leads to the loading matrix L = LH. As we shall see, varimax optimal rotations lead to visualizations of the loading matrix that admit an easier interpretation than unrotated loadings matrices. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 17 / 61

The factor model In the previous example, it is easy to show that VC (L) = 0.1406, for the first solution, while VC (L) = 0.0035, for the second solution. Then, the first solution is the selected by the varimax criterion. The two ways of choosing the loading matrix, namely, finding the loading matrix L with the smaller trace of Σ ɛ or finding that with the larger VC, are not equivalent. However, in this example, both ways resulted in the same solution. Note that, in this example, we have not found the optimal orthogonal matrix H, we have just compared two possible solutions. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 18 / 61

The factor model If the univariate variables in x = (x 1,..., x p ) have different units of measurement, it is preferable to consider the scaled variables. Then, the univariate standardization of the variables in x leads to the multivariate random variable y = 1/2 x (x µ x ), where x is a diagonal matrix with the variances of the variables in x. The multivariate random variable y has mean 0 p and covariance matrix ϱ x, the correlation matrix of x. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 19 / 61

The factor model Then, if x follows the factor model given by: x = µ x + Lf + ɛ with the properties given in slide 9, then y follows the factor model given by: with covariance matrix decomposition: y = 1/2 x Lf + 1/2 x ɛ ϱ x = 1/2 x LL 1/2 x + 1/2 x Σ ɛ 1/2 x Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 20 / 61

The factor model As a consequence, the factor model for y is similar to the factor model for x with: 1 a loading matrix M = 1/2 x L; 2 a set of factors f (the same factors); and 3 a set of errors ε = 1/2 x ɛ with diagonal covariance matrix Σ ε = 1/2 x Σ ɛ 1/2 x. In other words, we have the factor model for y given by: with covariance matrix decomposition: y = Mf + ε ϱ x = MM + Σ ε Working with ϱ x is advantageous because the diagonal entries of ϱ x are 1, then the sum of communalities and uniquenesses are 1, and their interpretations become easier. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 21 / 61

Estimation of the loading matrix and the factor scores In practice, we have a data matrix X of dimension n p such that each row of X, x i, for i = 1,..., n, has been generated by the factor model. Given the data matrix X, the goal is to estimate the loading matrix, L, the covariance matrix of the errors, Σ ɛ, and the values of the factors for each observation x i, for i = 1,..., n, which are called the factor scores. There are two main approaches for such goals: non-distributional methods; and distributional methods. For Gaussian data, we expect that distributional methods that explicitly assumes Gaussianity of the data works better than non-distributional methods. Indeed methods based on Gaussian assumptions still work well if the distribution of the data does not deviate too much from Gaussianity. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 22 / 61

Estimation of the loading matrix and the factor scores There are two main non-distributional methods: 1 Principal component factor analysis; and 2 Principal factor analysis. Both methods are based on: 1 the covariance matrix decomposition given by Σ x = LL + Σ ɛ, if we work with x; or 2 the correlation matrix decomposition given by ϱ x = MM + Φ, if we work with the scaled variable y. For simplicity, we present the methods for x and the covariance decomposition but similar results are obtained for y and the correlation decomposition. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 23 / 61

Estimation of the loading matrix and the factor scores We start with the principal component factor analysis method. First, the spectral decomposition of Σ x is given by: Σ x = V p Λ p V p where: Vp is the matrix that contains the eigenvectors of Σ p; and Λp is the p p diagonal matrix that contains the eigenvalues of Σ p. Consequently, from the covariance matrix decomposition, we have that: V p Λ p V p = LL + Σ ɛ Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 24 / 61

Estimation of the loading matrix and the factor scores If we assume that Σ ɛ = 0 p p, then V p Λ p V p = LL. Now, as L has dimension p r, LL is a matrix with rank r < p. As a consequence, Λ p contains p r eigenvalues equal to 0, so that, we can put: L = V r Λ 1/2 r where: Vr is the matrix that contains the eigenvectors of Σ x associated with the r eigenvalues of Σ x different of 0; and Λr is the r r diagonal matrix that contains these eigenvalues. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 25 / 61

Estimation of the loading matrix and the factor scores Now, given the data matrix X, we can compute the sample covariance matrix S x (or the sample correlation matrix R x, if the data have been scaled). Thus, eigenvectors and eigenvalues of Σ x (or ϱ x ) are replaced with those of S x (or R x ). The idea is to select r as in PCA, i.e., using the variance explained by the principal components obtained from S x or R x. Of course, in most of the situations the assumption Σ ɛ = 0 p p is unrealistic but the method use to provide with good estimates of the loading matrix. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 26 / 61

Illustrative example (I) Consider the eight univariate variables measured on the 50 states of the USA. We took the logarithm for the first, third and eighth variables. Additionally, we consider the sample correlation matrix rather than the sample covariance matrix. The first three eigenvectors explain the 76.69% of the total variability. Then, using the three largest eigenvalues of the correlation matrix and the associated eigenvectors, we can estimate the loading matrix, M (note that we are using the correlation matrix). For that we use the formula M = V Rx 3 ( Λ Rx 3 ) 1/2, where V R x 3 is the matrix with the first three eigenvectors of R x; and Λ R x 3 is the diagonal matrix with the three largest eigenvalues of R x. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 27 / 61

Illustrative example (I) The estimated loading matrix is given by: 0.44 0.47 0.53 0.52 0.54 0.26 0.87 0.04 0.13 M = 0.76 0.05 0.41 0.84 0.31 0.23 0.80 0.41 0.07 0.69 0.25 0.40 0.06 0.67 0.61 Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 28 / 61

Illustrative example (I) Now, we can use the varimax rotation to obtain the final estimate of the loading matrix, that is given by: 0.00 0.00 0.84 0.78 0.10 0.12 0.66 0.00 0.58 M = 0.76 0.38 0.16 0.57 0.52 0.51 0.82 0.20 0.32 0.28 0.00 0.79 0.00 0.90 0.00 Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 29 / 61

Illustrative example (I) The first factor distinguishes between cold states with rich, educated, long-lived and non-violent states, from warm states with poor, ill-educated, short-lived and violent states. The second factor distinguishes big states with rich and educated populations but violent and short-lived, from small states with long-lived people. The third factor distinguishes populated and violent states from less-populated, cold, non-violent, but short-lived, states. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 30 / 61

Illustrative example (I) Once that we have estimated the loading matrix, it is possible to estimate the covariance matrix of the errors, Σ ε, with the diagonal matrix of R x M ( M ). In this case, the estimated uniquenesses are given by 0.28, 0.35, 0.21, 0.24, 0.12, 0.17, 0.29 and 0.16. Therefore, the variables better explained by the factors are murder, HS graduates and log-area. Estimation of the value of the factors (the factor scores) for each state will be given later. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 31 / 61

Estimation of the loading matrix and the factor scores In principal factor analysis, we also start with the equality: Σ x = LL + Σ ɛ Therefore, LL = Σ x Σ ɛ has to be a matrix with range r < p because L has dimension p r. As a consequence, Σ x Σ ɛ has p r eigenvalues equal to 0. Thus, the spectral decomposition of Σ x Σ ɛ is given by: where: Σ x Σ ɛ = U r Ω r U r Ur is the matrix that contains the eigenvectors of Σ x Σ ɛ associated with the r eigenvalues of Σ x Σ ɛ different of 0; and Ωr is the r r diagonal matrix that contains these eigenvalues. Therefore, we can put: L = U r Ω 1/2 r Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 32 / 61

Estimation of the loading matrix and the factor scores As in the principal components factor analysis method, the covariance matrix Σ x (or the correlation matrix ϱ x, if the data have been scaled) is replaced with the sample covariance matrix S x (or the sample correlation matrix R x, if the data have been scaled). The problem with this method is that Σ ɛ is unknown and should be estimated. For that, we can use the estimate of Σ ɛ obtained with the principal components factor analysis method. Thus, eigenvectors and eigenvalues of Σ x Σ ɛ (or ϱ x Φ) are replaced with those of S x Σ ɛ (or R x Σ ε ). Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 33 / 61

Illustrative example (I) Consider again the eight univariate variables measured on the 50 states of the USA. Note that we are working with the sample correlation matrix, R x. As with the principal components factor analysis method, we consider three factors because the first three eigenvectors of R x Σ ε explains the 88.8% of the total variability. Then, using the three largest eigenvalues of the correlation matrix and the associated eigenvectors, we can estimate the loading matrix, M. As in the previous method, we use the formula where: M = U Rx Σ ε 3 ( Ω Rx Σ ε 3 U R x Σ ε 3 is the matrix with the first three eigenvectors of R x Σ ε; and ) 1/2, Ω R x Σ ε 3 is the diagonal matrix with the three largest eigenvalues of R x Σ ε. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 34 / 61

Illustrative example (I) The estimated loading matrix is given by: 0.42 0.27 0.56 0.48 0.36 0.32 0.84 0.08 0.10 M = 0.73 0.02 0.36 0.84 0.34 0.12 0.78 0.40 0.04 0.66 0.12 0.40 0.06 0.77 0.36 Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 35 / 61

Illustrative example (I) Now, we can use the varimax rotation to obtain the final estimate of the loading matrix, that is given by: 0.00 0.00 0.75 0.67 0.00 0.00 0.65 0.00 0.56 M = 0.74 0.30 0.17 0.59 0.47 0.51 0.80 0.21 0.30 0.28 0.00 0.73 0.00 0.85 0.00 Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 36 / 61

Illustrative example (I) Note that both matrices are very close to those estimated with the principal components factor analysis method. Consequently, the interpretation of the factor is similar to that the obtained for the previous estimation method. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 37 / 61

Illustrative example (I) As in the previous case, once that we have estimated the loading matrix, it is possible to estimate the covariance matrix of the errors, Σ ɛ, with the diagonal matrix of R x M ( M ). In this case, the estimated uniquenesses are 0.42, 0.52, 0.26, 0.31, 0.15, 0.21, 0.38 and 0.26. Therefore, the variables better explained by the factors are log-illiteracy, murder, HS Graduate and log-area. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 38 / 61

Estimation of the loading matrix and the factor scores We focus now in estimating the factor scores. The Bartlett factor scores method can be used with these non-parametric methods. Essentially, the Bartlett method assumes that the factors are parameters. Then, from the factor model, the p 1 vector x i, for i = 1,..., n, is given by: Therefore, by generalized least squares: x i = µ x + Lf i + ɛ i f i = ( L Σ 1 ɛ L ) 1 L Σ 1 ɛ (x i µ x ) The final estimate is obtained after replacing µ x, L and Σ ɛ by their respective estimates. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 39 / 61

Estimation of the loading matrix and the factor scores If we have used the scaled data, the estimated factor scores are given by: where y i are the scaled observations. f i = ( M Σ 1 ε M ) 1 M Σ 1 ε y i Note that L and M are replaced with L and M if we have used the varimax rotation. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 40 / 61

Illustrative example (I) The next three slides show scatterplots of the three factors estimated with the previous methods. For a better comparison the sign of the second factor for the principal factor analysis method has been changed. The three plots appear to confirm the interpretation of the factors given before. Note that Alaska can be seen as a kind of outlier. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 41 / 61

Illustrative example (I) Scores with first method Second factor 3 2 1 0 1 2 3 South Carolina Mississippi West Virginia Arkansas Kentucky Louisiana North Carolina AlabamaGeorgia Tennessee New Mexico Rhode Island Delaware New Jersey New Hampshire Vermont Maine Maryland North Dakota Oklahoma South Pennsylvania Indiana Dakota Virginia Ohio Missouri Idaho Florida New York Michigan Arizona Illinois Montana Texas Wyoming Nevada Alaska Connecticut Massachusetts Hawaii Wisconsin Iowa Nebraska Minnesota Kansas Utah Oregon Colorado Washington Californi 2 1 0 1 First factor Scores with second method Rhode Island Second factor 3 2 1 0 1 2 3 South Carolina Mississippi West Virginia Kentucky Arkansas Louisiana North Carolina Alabama Georgia Tennessee New Mexico Delaware New Jersey New Hampshire Vermont Maine Maryland North Dakota Oklahoma Pennsylvania Virginia South Dakota Indiana Ohio Missouri Idaho New Florida York Michigan Illinois Arizona TexasMontana Wyoming Nevada Alaska Connecticut Massachusetts Hawaii Wisconsin Iowa Nebraska Minnesota KansasOregon Utah Colorado Washington Californi 2 1 0 1 2 First factor Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 42 / 61

Illustrative example (I) Scores with first method Californi Third factor 2 1 0 1 2 Mississippi South Carolina Louisiana Alabama Georgia North Carolina Arkansas Kentucky West Virginia Tennessee New Mexico Texas Florida New York Virginia Arizona Illinois Maryland Michigan New Jersey Pennsylvania Ohio Missouri Oklahoma Indiana Rhode Island Delaware Alaska Idaho New Hampshire Nevada North Dakota MaineVermont Montana WyomingSouth Dakota Hawaii Massachusetts Connecticut Oregon Wisconsin Colorado Kansas Iowa Minnesota Nebraska Utah Washington 2 1 0 1 First factor Scores with second method Californi Third factor 2 1 0 1 2 Mississippi South Carolina Alabama Louisiana Georgia North Carolina Kentucky Arkansas West Virginia Tennessee New Mexico Texas New Florida York Arizona Virginia Maryland Michigan Illinois New Jersey Pennsylvania Ohio Missouri Oklahoma Indiana Rhode Island Delaware Alaska Idaho New Hampshire North Dakota Maine Vermont Nevada Montana Wyoming South Dakota Hawaii Massachusetts Connecticut Oregon Wisconsin Kansas Colorado Iowa Minnesota Nebraska Utah Washington 2 1 0 1 2 First factor Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 43 / 61

Illustrative example (I) Scores with first method California Third factor 2 1 0 1 2 Alaska Nevada Texas New Florida York Hawaii Alabama Louisiana Arizona Georgia Michigan Illinois Virginia New Jersey Washington Maryland Massachusetts North Ohio Tennessee Carolina Pennsylvania Connecticut Missouri Mississippi Oregon South Indiana Arkansas Kentucky Carolina Oklahoma Colorado Kansas Wisconsin Minnesota New Mexico Iowa West Virginia Delaware Utah Nebraska Idaho North Dakota New Hampshire Montana Maine Vermont Wyoming South Dakota Rhode Is 3 2 1 0 1 2 3 Second factor Scores with second method California Third factor 2 1 0 1 2 Alaska Nevada Texas New Florida York Hawaii Arizona Alabama Louisiana Michigan Illinois Georgia Washington Virginia New Jersey Maryland Massachusetts Ohio North Tennessee Connecticut Oregon Carolina Pennsylvania Missouri Mississippi Oklahoma Indiana South Kentucky Arkansas Carolina Kansas Colorado New Mexico Minnesota Wisconsin Iowa Delaware Utah Nebraska West Virginia Idaho North Dakota New Hampshire Montana Maine Vermont Wyoming South Dakota Rhode Is 3 2 1 0 1 2 3 Second factor Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 44 / 61

Estimation of the loading matrix and the factor scores Principal component factor analysis and principal factor analysis are non-parametric procedures and can therefore be applied to data without requiring knowledge of the underlying distribution of the data. If we know that the data are Gaussian or not very different from Gaussian, exploiting this extra knowledge may lead to better estimators for the loading matrix and the factor scores. Now, we consider the case in which x and ɛ are Gaussian distributed and use maximum likelihood to estimate the model parameters. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 45 / 61

Estimation of the loading matrix and the factor scores Given a data matrix generated from the Gaussian factor model, the log-likelihood function of the parameters of the model is given by: l (µ x, Σ x X ) = np 2 log 2π n 2 log Σ x 1 2 n i=1 (x i µ x ) Σ 1 x (x i µ x ) where Σ x = LL + Σ ɛ. As seen in Chapter 2, the MLE of µ x is given by µ x = x. Replacing this quantity in l (µ x, Σ x X ), leads to: l (Σ x X, µ x = x) = np 2 log 2π n 2 log Σ x n 1 2 Tr [ Σ 1 x S x ] Now, we replace Σ x = LL + Σ ɛ leading to: l (L, Σ ɛ X, µ x = x) = np 2 log 2π n 2 log LL + Σ ɛ n 1 2 Tr [ (LL + Σ ɛ ) 1 S x ] Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 46 / 61

Estimation of the loading matrix and the factor scores There are not explicit form for the MLE of L and Σ ɛ, denoted by L and Σ ɛ, respectively, unless some restrictions of the form of these matrices are imposed. Then, optimization methods are required to obtain the MLEs numerically. In any case, the MLE is invariant under linear transformations of the variables. Therefore, the solutions obtained using the original or the scaled variables are equivalent. Indeed, the MLE, as implemented in R, provides with the solution for the scaled data. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 47 / 61

Estimation of the loading matrix and the factor scores In order to determine the number of factors with MLE (assuming Gaussianity), we can use the likelihood ratio test for the following hypothesis: H 0 : the number of factors is r H 1 : the number of factors is not r The likelihood ratio test for these hypothesis is given by: L L + Σ ( ɛ ( L L λ = n log np + (n 1) Tr + Σ ) ) 1 ɛ Sx Σ x where Σ x is the MLE of Σ x and S x is the sample covariance matrix. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 48 / 61

Estimation of the loading matrix and the factor scores The ( test statistic λ under ) the null hypothesis H 0 has a χ 2 distribution with (p r) 2 (p + r) degrees of freedom. 1 2 The idea is to apply sequentially the LR test, i.e., start with r = 1 and, if the test is rejected, then consider the case of r = 2, and so on. Note however that the maximum number of factors that we can consider values of r that verifies (p r) 2 (p + r) > 0. Thus, there exists a maximum value of r that can be used with MLE. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 49 / 61

Estimation of the loading matrix and the factor scores The regression factor scores method is used usually with MLE to estimate the factor scores. The method assumes that the factors are random variables, and to look for a linear predictor that minimizes the mean squared prediction error. The pair (f i, x i ) has a multivariate Gaussian distribution. Therefore, it is possible to show that the linear predictor that minimizes the mean squared prediction error is just: E [f i x i ] = ( I r + L Σ 1 ɛ L ) 1 L Σ 1 ɛ (x i µ x ) The final estimate is obtained after replacing µ x, L and Σ ɛ by their respective ML estimates. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 50 / 61

Illustrative example (I) Consider the eight univariate variables measured on the 50 states of the USA. Note that this dataset is rarely Gaussian distributed. However, we apply the MLE approach to estimate the factor model. We estimate the factor model using MLE. The values of the LRT statistic λ are 89.63, 46.53, 20.57 and 7.39, respectively, with associated p-values 8.62 10 11, 1.16 10 5, 0.00446 and 0.0249, respectively. Therefore, four factors are adequate using the LRT. However, note that the data are highly non-gaussian, so these p-values should be taken with caution. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 51 / 61

Illustrative example (I) The estimated loading matrix using the scaled data and using the varimax rotation is given by: 0.00 0.11 0.22 0.96 0.65 0.00 0.14 0.00 0.51 0.41 0.54 0.10 M = 0.45 0.72 0.00 0.00 0.23 0.88 0.35 0.18 0.93 0.19 0.14 0.25 0.15 0.14 0.94 0.23 0.31 0.40 0.00 0.00 The estimated uniquenesses are 0.0050, 0.538, 0.258, 0.255, 0.0050, 0.0050, 0.0050 and 0.7383. Therefore, the variables better explained by the factors are log-population, murder, HS graduates and frost. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 52 / 61

Illustrative example (I) The first factor distinguishes between states with long-lived and rich and educated populations from violent states with ill-educated people. The second factor distinguishes between violent states from long-lived states. The third factor distinguishes between cold states with educated people from warm states with ill-educated people. The fourth factor distinguishes large-populated states from less-populated states. The following slides show the estimated factor scores. Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 53 / 61

Illustrative example (I) Scores with MLE Nevada Alaska Second factor 1 0 1 2 Alabama Georgia Mississippi Louisiana South Carolina KentuckyNorth Carolina Tennessee Arkansas West Virginia North Dakota Michigan New Mexico Illinois New York Texas Wyoming Missouri Virginia Maryland Florida Vermont Ohio Indiana Montana Delaware Idaho Arizona Pennsylvania Oklahoma New Hampshire New Jersey Kansas Hawaii Maine Nebraska Wisconsin Connecticut Minnesota South Dakota Iowa Massachusetts Oregon Colorado Washington Utah Californi Rhode Island 2 1 0 1 2 First factor Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 54 / 61

Illustrative example (I) Scores with MLE North Dakota Nevada Third factor 2 1 0 1 Kentucky Rhode Island West Virginia North Carolina Tennessee South Carolina Georgia Arkansas Mississippi Alabama Louisiana South Dakota New Hampshire Minnesota Maine Vermont Wisconsin Illinois Wyoming Pennsylvania Michigan Ohio Montana Connecticut Indiana Iowa Missouri New Jersey Nebraska New Mexico Maryland Idaho New York Kansas Virginia Massachusetts Delaware Oklahoma Texas Oregon Florida Arizona Colorado Alaska Utah Washington Californi Hawaii 2 1 0 1 2 First factor Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 55 / 61

Illustrative example (I) Scores with MLE Pennsylvania New York Californi Ohio Illinois Texas New Jersey Michigan Fourth factor 2 1 0 1 North Carolina Georgia Tennessee Kentucky South Carolina Louisiana Alabama West Virginia Arkansas Mississippi Rhode Island North Dakota Missouri Virginia Oklahoma Maine South Dakota Delaware Wisconsin Indiana Minnesota Florida Maryland Connecticut Iowa Nebraska New Hampshire New Mexico Montana Idaho Vermont Massachusetts Kansas Oregon Arizona Hawaii Wyoming Colorado Washington Utah Nevada Alaska 2 1 0 1 2 First factor Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 56 / 61

Illustrative example (I) Scores with MLE North Dakota Nevada Third factor 2 1 0 1 Rhode Island South Minnesota Dakota Wisconsin Maine Connecticut Iowa Nebraska Massachusetts Oregon Washington New Hampshire Colorado Vermont Wyoming Illinois Pennsylvania Montana Ohio Indiana New Jersey Missouri Utah Kentucky New Mexico Idaho Maryland West Virginia Kansas North Carolina New York Virginia Tennessee Delaware South Carolina Oklahoma Arkansas Mississippi Texas California Louisiana Florida Arizona Michigan Georgia Alabama Alaska Hawaii 1 0 1 2 Second factor Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 57 / 61

Illustrative example (I) Scores with MLE Pennsylvania California New York Ohio Illinois New Jersey Texas Michigan Fourth factor 2 1 0 1 Massachusetts Wisconsin Minnesota Connecticut Iowa Washington Oregon Nebraska Maine Rhode Island North South Dakota Dakota Hawaii Indiana Florida Missouri North Carolina Virginia Maryland Tennessee Kentucky Colorado Kansas Oklahoma South Carolina Louisiana West Virginia Arkansas Mississippi Arizona Utah New Hampshire New Mexico Idaho Montana Vermont Delaware Wyoming Georgia Alabama Alaska Nevada 1 0 1 2 Second factor Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 58 / 61

Illustrative example (I) Scores with MLE California New York Ohio Pennsylvania Illinois Texas New Jersey Michigan Fourth factor 2 1 0 1 Hawaii Arizona Florida Washington Louisiana Oregon Alabama Massachusetts Wisconsin Indiana Minnesota North Carolina Missouri Virginia Maryland Connecticut Georgia Iowa Tennessee Kentucky Colorado Oklahoma Kansas South Carolina Nebraska West Virginia Mississippi Arkansas Maine Utah Rhode Island New Hampshire New Mexico South Dakota North Dakota Idaho Montana Nevada Vermont Delaware Wyoming Alaska 2 1 0 1 Third factor Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 59 / 61

Chapter outline We are ready now for: Chapter 5: Multidimensional scaling Pedro Galeano (Course 2017/2018) Multivariate Statistics - Chapter 4 Master in Mathematical Engineering 60 / 61