Capítulo 12 FACTOR ANALYSIS 12.1 INTRODUCTION

Size: px
Start display at page:

Download "Capítulo 12 FACTOR ANALYSIS 12.1 INTRODUCTION"

Transcription

1 Capítulo 12 FACTOR ANALYSIS Charles Spearman ( ) British psychologist. Spearman was an officer in the British Army in India and upon his return at the age of 40, and influenced by the work of Galton, he decided to do his doctoral thesis on the objective measurements of intelligence. He proposed the first factor analysis model based on a common factor, the g factor, and a specific component. Upon receiving his PhD he was named full professor and occupied the first Chair in Psychology at University College, London INTRODUCTION Factor analysis is used to explain variability among a set of observed variables in terms of a small number of latent, or unobserved variables, called factors. For example, suppose that we take twenty body measurements from a person: height, length of torso and extremities, shoulder width, weight, etc. It is intuitive to think that these measurements are not independent from each other, and that if we know one of them we can predict the others with a small 107

2 108 CAPÍTULO 12. FACTOR ANALYSIS marginoferror. Oneexplanationforthisfactisthatthesemeasurementsaredetermined by the same genes and therefore they are highly correlated. Thus if we know some of them, we can predict with small error the values of the other variables. As a second example, suppose we are interested in studying human development around the world and that we have many economic, social and demographic variables available, all in general, interdependent, and related to development. We can ask ourselves whether the development of a country depends on a small number of factors such that, if we knew their values we could predict the set of variables for each country. As a third example, we use different tests to measure the intellectual capacity of an individual to process information and solve problems. We can ask ourselves if there are factors, not directly observable which explain the set of observed results. The set of these factors is what we will call intelligence and it is important to know how many different dimensions this concept has and how to characterize and measure them. Factor analysis came about thanks to the interest of Karl Pearson and Charles Spearman in understanding the dimensions of human intelligence in the 1930 s; as a result, many of their advances were produced in the area of psychometry. Factor analysis is related to principal components, but there are certain differences. First, principal components are constructed to explain variance, whereas factors are constructed to explain the covariance or correlations between the variables. Second, principal components is a descriptive tool, while factor analysis assumes a formal statistical model. On the other hand, principal components can be seen as a particular case of Factor analysis, as we will see later 12.2 THE FACTOR MODEL Basic Hypothesis Suppose that we observe a vector of variables x, of dimensions (p 1), in elements of a population. The factor analysis model establishes that this vector is generated by the equation: x = μ + Λf + u (12.1) where: 1. f is a vector (m 1) of latent variables, or unobserved factors. We assume that it follows a distribution N m (0, I), that is, the factors have zero mean, are independent from each other and have a normal distribution. 2. Λ is a matrix (p m) of unknown constants (m <p). It contains the coefficients which describe how the factors, f, affect the observed variables, x, and is called the loading matrix. 3. u is a vector (p 1) of unobserved perturbations. It includes the effect of all those variables which are different from the factors influencing x. We assume that u has a distribution N p (0, ψ) where ψ is diagonal, and that the perturbations are uncorrelated with the factors f.

3 12.2. THE FACTOR MODEL 109 With these three hypotheses we deduce that: (a) μ is the mean of the variable x, since both the factors and the perturbations have a zero mean; (b) x has a normal distribution, being the sum of the normal variables, and letting V be thecovariancematrix x N p (μ, V). The equation (12.1) implies that given a random sample of n elements generated by the factor model, each piece of data x ij can be written as: x ij = μ j + λ j1 f 1i λ jm f mi + u ij i =1,...,n j =1,..., p which decomposes x ij, the observed value in the individual i of the variable j, asasumof m+2 terms. The firstisthemeanofthevariablej, μ j, from the second to m+1 contains the effect of the m factors, and the last is a specific perturbation of each observation, u ij. The effects of the factors on x ij are the product of the coefficients λ j1,..., λ jm, which depend on the relationship between each factor and the variable j, (and that they are the same for all the items in the sample), times the values of the m factors in the sampling item i, f 1i,..., f mi. Joining the equations for all of the observations, the data matrix, X, (n p), can be written as: X = 1μ 0 + FΛ 0 +U where 1 is a n 1 vector of ones, F is a matrix (n m) which contains the m factors for the n items of the population, Λ 0 isthetransposeoftheloadingmatrix(m p) whose constant coefficients relate the variables and the factors and U is a matrix (n p) of perturbations Properties The loading matrix Λ contains the covariances between the factors and the observed variables. Note that the covariance matrix (p m) between the variables and the factors is obtained by multiplying (12.1) by f 0 on the right and taking expected values: E (x μ)f 0 = ΛE [ff 0 ]+E [uf 0 ]=Λ since, from the hypothesis, the factors are uncorrelated (E [ff 0 ]=I), have a mean of zero, and are uncorrelated with the perturbations (E [uf 0 ]=0). This equation indicates that the terms λ ij of the loading matrix, Λ, represent the covariance between the variable x i and the factor f j, and, as the factors have unit variance, they are also the regression coefficients when we explain the observed variables by the factors. In the particular case in which the x variables are standardized, the terms λ ij are also the correlations between the variables and the factors. The covariance matrix between the observations verifies that, according to (12.1): V = E (x μ)(x μ) 0 = ΛE [ff 0 ] Λ 0 + E [uu 0 ] since E[fu 0 ]=0the factors and the noise are uncorrelated. Thus, we obtain the fundamental property: V = ΛΛ 0 + ψ, (12.2)

4 110 CAPÍTULO 12. FACTOR ANALYSIS which establishes that the covariance matrix of the observed data can be decomposed as the sum of two matrices: (1) The first, ΛΛ 0, is a symmetric matrix of rank m<p.this matrix contains the part which is common to the set of variables and depends on the covariance between the variables and the factors. (2) The second, ψ, is diagonal, and contains the specific partofeachvariable, whichis independent from the rest. This decomposition implies that the variances of the observed variables can be written as: mx σ 2 i = λ 2 ij + ψ 2 i, i =1,...,p. j=1 where the first term is the sum of the effects of the factors and the second is the variance of the perturbation. Letting mx h 2 i = λ 2 ij, be the sum of the factor effects, which we will call communality, we get j=1 σ 2 i = h 2 i + ψ 2 i, i =1,...,p. (12.3) This equality can be interpreted as a decomposition of the variance in: Observed variance = Common variance + Specific variance (Communality) which is analogous to the classical decomposition of variability in an explained part and another unexplained, which is carried out in the analysis of variance. In the factor model the explained part is due to the factors and the unexplained is due to noise. This equation is the basis for the analysis which follows. Example: Let us assume that we have three variables generated by two factors. The covariance matrix must verify σ 11 σ 12 σ 13 σ 21 σ 22 σ 23 σ 31 σ 32 σ 33 = λ 11 λ 12 λ 21 λ 22 λ 31 λ 32 λ11 λ 21 λ 31 + λ 12 λ 22 λ 32 ψ ψ ψ 33 This equality provides 6 different equations (remember that since V is symmetric it has only 6 different terms). This first is: σ 11 = λ λ ψ 11 We let h 2 1 = λ λ 2 12 be the contribution of the two factors in variable 1. Thesixequations are: σ ii = h 2 i + ψ 2 i i =1, 2, 3 σ ij = λ i1 λ j1 + λ i2 λ j2 i =1, 2, 3 i 6= j

5 12.2. THE FACTOR MODEL Uniqueness of the model In the factor model, neither the loading matrix, Λ, nor the factors, f, are observable. This poses the problem of indeterminacy: we will say that two representations (Λ, f) and (Λ, f ) are equivalent if Λf = Λ f This situation leads to two types of indeterminacy. (1) A set of data can be explained with the same accuracy with correlated or uncorrelated factors. (2) The factors are not determined uniquely. Let us analyze these two indeterminacies. To show the first, note that if H is any nonsingular matrix, the representation (12.1) can be written as x = μ + ΛHH 1 f + u (12.4) and letting Λ = ΛH be the new loading matrix, and f = H 1 f be the new factors: x = μ + Λ f + u, (12.5) where the new factors f now have a distribution N 0, H 1 (H 1 ) 0 and, thus, they are correlated. Analogously, starting from the correlated factors, f N(0, V f ), we can always find an equivalent expression of the variables using a model with uncorrelated factors. To show this, let A beamatrixsothatv f = AA 0. (This matrix always exists if V f is positive definite), then A 1 V f (A 1 ) 0 = I, and writing x = μ + Λ(A)(A 1 )f + u, and taking Λ = ΛA as the new coefficient matrix of the factors, and f = A 1 f as the new factors, the model is equivalent to another with uncorrelated factors. This indeterminacy is solved in the hypothesis of the model by always taking the uncorrelated factors. The second type of indeterminacy appears because if H is orthogonal, the model x = μ + Λf+ u and the x = μ +(ΛH)(H 0 f)+u are indistinguishable. Both contain uncorrelated factors, with an identity covariance matrix. In this sense, we say that the factor model is indeterminate to rotations. This indeterminacy is solved by imposing restrictions on the components of the loading matrix, as we will see in the next section. Example: We assume that x =(x 1,x 2,x 3 ) 0, and the following factorial model M 1 : x = f1 + f 2 and the factors are uncorrelated. We are going to write this model as another equivalent model of uncorrelated factors. Taking H = , this matrix is orthogonal since 1 1 H 1 = H 0 = H. Thus x = f1 1 +[u] f 2 u 1 u 2 u 3

6 112 CAPÍTULO 12. FACTOR ANALYSIS Denoting this model as M 2, it can also be written as: x = g1 +[u] g 2 and the new factors, g, are related to the previous ones, f, by: g1 ³ 1 1 f1 = 2 1 g f 2 and are thus a rotation of the initial factors. uncorrelated. Their variance matrix is: V g = ³ We prove that these new factors are also V f ³ 2 1 and if V f = I V g = I, from which it is deduced that the models M1 and M2 are indistinguishable Normalization of the factor model Since the factor model is indeterminate to rotations, the matrix Λ is unidentified. This implies that although we observe the whole population, and μ, and V areknown,wecannot determine Λ uniquely. The solution is to impose restrictions on its terms. The two principal estimation methods which we will study next use some of the two following normalizations: Criterion 1: Requires: Λ 0 m pλ p m = D = Diagonal (12.6) With this normalization, the vectors which define the effect of each factor over the p observed variables are orthogonal. In this way, besides being uncorrelated the factors produce the most distinct effects in the variables. We are going to prove that this normalization defines a loading matrix uniquely. Firstweassumethatwehave amatrixλ such that the product Λ 0 Λ is not diagonal. We transform the factors with Λ = ΛH, where H is the matrix which contains the columns of the eigenvectors of Λ 0 Λ. Then: Λ 0 Λ = H 0 Λ 0 ΛH (12.7) and since H diagonalizes Λ 0 Λ the matrix Λ verifies the condition (12.6), and we see now that this is the only matrix which does so. Suppose that we rotate this matrix and let Λ = ΛC where C is orthogonal. Then the matrix Λ 0 Λ = C 0 Λ 0 Λ C will not be diagonal. Analogously, if we start with a matrix which verifies (12.6) and we rotate it, it will no longer verify this condition.

7 12.2. THE FACTOR MODEL 113 When this normalization is verified, postmultiplying the equation (12.2) by Λ, allows us to write (V ψ) Λ = ΛD, which means that the columns of Λ are eigenvectors of the matrix V ψ, which has the diagonal terms of D as its eigenvalues. This property is used in the estimation by the principal factor method. Criterion 2: Requires: Λ 0 ψ 1 Λ = D =Diagonal (12.8) In this normalization the effects of the factors on the variables, weighted by the variances of the perturbations of each equation become uncorrelated. As before, this normalization defines a loading matrix uniquely. To show this, we assume that Λψ 1 Λ is not diagonal, and we transform with Λ = ΛH. Then: Λ 0 ψ 1 Λ = H 0 Λ 0 ψ 1 Λ H (12.9) and since Λ 0 ψ 1 Λ is a symmetric matrix and non-negative definite, it can always be diagonalized if we choose as H the matrix which contains the eigenvectors of Λ 0 ψ 1 Λ in columns. Analogously, if (12.8) is verified from the beginning and we rotate the loading matrix this condition is no longer verified. This is the normalization used in maximum likelihood estimation. Its justification is that in this way the factors are conditionally independent given the data, as is shown in Appendix With this normalization, postmultiplying the equation (12.2) by ψ 1 Λ,weget and premultiplying by ψ 1/2, the result is: which implies Vψ 1 Λ Λ = Λ D ψ 1/2 Vψ 1 Λ ψ 1/2 Λ = ψ 1/2 ΛD ψ 1/2 Vψ 1/2 ψ 1/2 Λ = ψ 1/2 Λ (D + I) andweconcludethatthematrixψ 1/2 Vψ 1/2 has eigenvectors ψ 1/2 Λ with eigenvalues D + I. This property is used in maximum likelihood estimation Maximum number of factors If we replace the theoretical covariance matrix, V, in (12.2) with the sampling matrix, S, the system will be identified if it is possible to solve it uniquely. In order to do so there is a restriction on the number of possible factors. The number of equations which we obtain from (12.2) is equal to the set of terms of S, which is p + p(p 1)/2 =p(p +1)/2. The

8 114 CAPÍTULO 12. FACTOR ANALYSIS number of unknowns in the second term is pm, the coefficients of the matrix Λ, plus the p terms of the diagonal of ψ, minus the restrictions imposed in order to identify the matrix Λ. Assuming that Λ 0 ψ 1 Λ is diagonal, this supposes m(m 1)/2 restrictions on the terms of Λ. For the system to be determined there must be a number of equations equal to or greater than the number of unknowns. If there are fewer equations than unknowns it is impossible to find a single solution and the model is unidentified. If the number of equations is exactly equal to the number of unknowns there will be a single solution. If there are more equations than unknowns, we can solve the system using least squares, finding the values of the parameters which minimize the estimation errors. Therefore: which assumes: p + pm m(m 1) 2 p(p +1) 2 p + m p 2 2pm + m 2, that is (p m) 2 1 p + m. Thereadercanprovethatthisequationimpliesthatwhenp is not large (less than 10) then approximately the maximum number of factors must be less than half the number of variables minus one. For example, the maximum number of factors with 7 variables is THE PRINCIPAL FACTOR METHOD The principal factor method is a method for estimating the loading matrix based on principal components. It avoids the need to solve maximum likelihood equations which are more complex. It has the advantage that the dimension of the system can be identified approximately. Because of its simplicity it is used in many computer programs. Its basis is the following: suppose that we can obtain an initial estimation of the variance matrix of the perturbations bψ. Then, we can write S ψ b = ΛΛ 0, (12.10) and as S ˆψ is symmetric, it can always be decomposed as: S b ψ = HGH 0 =(HG 1/2 )(HG 1/2 ) 0 (12.11) where H is square of order p and orthogonal, G is also of order p, diagonal and contains the eigenvalues of S ψ. b The factor model establishes that G must be diagonal of type: G G = 1m m O m (p m) O (p m) (p m) O (p m) m since S ψ b has rank m. Thus, if we let H 1 be the matrix p m which contains the eigenvectors associated with the non-null eigenvalues of G 1 we can build an estimator of Λ i by the p m matrix: bλ = H 1 G 1/2 1 (12.12)

9 12.3. THE PRINCIPAL FACTOR METHOD 115 Note that the resulting normalization is: bλ 0 b Λ = G 1/2 1 H 0 1H 1 G 1/2 1 = G 1 = Diagonal (12.13) since the eigenvectors of symmetric matrices are orthogonal. Thus H 0 1H 1 = I m.therefore, with this method we obtain an estimate of the matrix Λ b with columns which are orthogonal to each other. In practice, the estimation is carried out iteratively as follows: 1. Start (i =1)from an initial estimation of Λ b i and compute ψ b i by using ψ b i = diag ³S Λ b Λ b Calculate the square and symmetric matrix Q i = S b ψ i. 3. Obtain the spectral decomposition of Q i so that Q i = H 1i G 1i H 0 1i + H 2i G 2i H 0 2i where G 1i contains the m greatest eigenvalues of Q i and H 1i its eigenvectors. We choose m so that the rest of the eigenvectors contained in G 2i are all small and of similar size. The matrix Q i may not be positive definite and some of its eigenvalues can be negative. This is not a serious problem if these eigenvalues are small and we can assume them to be near zero. 4. Take Λ b i+1 = H 1i G 1/2 1i and go back to (1). Iterate until reaching convergence, that is, until kλ n+1 Λ n k <². The estimators obtained will be consistent but not efficient as in the case of maximum likelihood. Neither are they invariant to linear transformations as are the ML, that is, the same results are not necessarily obtained with the covariance matrix and with that of correlations. To put this idea into practice, we have to specify how to obtain the initial estimator Λ b i or ψ b i, a problem which is known as communality estimation Communality estimation Estimating the terms ψ 2 i is equivalent to defining values for the diagonal terms, h 2 i, of ΛΛ 0, since h 2 i = s 2 i ψ b 2 i. The following alternatives are used: 1. Take b ψ i =0. This is equivalent to extracting the principal components of S. This supposes taking b h 2 i = s 2 i (in the case of correlations b h 2 i =1), which is clearly its maximum value, so that we can begin with a significant bias. 2. Take b ψ 2 j =1/s jj, wheres jj is the j-th diagonal element of the precision matrix S 1. According to Appendix 3.2, this is equivalent to taking h 2 j as: b h 2 j = s 2 j s 2 j(1 R 2 j)=s 2 jr 2 j, (12.14)

10 116 CAPÍTULO 12. FACTOR ANALYSIS where Rj 2 is the multiple correlation coefficient between x j and the rest of the variables. Intuitively, the greater Rj 2 is, the greater the communality b h 2 j. With this method we start with a low biased estimation of h 2 i,since b h 2 i h 2 i. To show this, suppose, for example, that the true model for the variable x 1 is mx x 1 = λ 1j f j + u 1 (12.15) j=1 which is associated with the decomposition σ 2 1 = h ψ 2 1. The proportion of explained variance is h 2 1/σ 2 1. If we write the regression equation x 1 = b 2 x b p x p + ² 1 replacing each variable with its expression in terms of the factors we have: x 1 = b 2 ³X λ2j f j + u b p ³X λpj f j + u p + ². (12.16) which leads to a decomposition of the variance σ 2 1 = b h b ψ 2 1.Clearly b h 2 1 h 2 1,sincein (12.16) we force the noise u 1,...,u p of each equation to appear as a regressor of the factors as in (12.15). Moreover, it is possible that a factor affects x 1 but not the rest, so that it will not appear in equation (12.16). To summarize, the estimated communality in (12.16) will have a lower bound than the true value of the communality. Example: In this example we will show in detail the iterations of the principal factor algorithm for the ACCIONES data in Annex I. The covariance matrix of these data in logarithms is, S = In order to estimate the loading matrix we carry out the steps of the principal factor algorithm as described above. Before starting the algorithm we need to set the bound to decide the convergence. We make ε large, 0.05, so that in few iterations the algorithm converges despite the accumulated rounding errors. Step 1. Taking the second alternative for the initial estimation of the commonalities diag( b ψ 2 i )= 1/s jj. where s jj is the j-th element of matrix S 1 bψ 2 i = S 1 = / / / =

11 12.3. THE PRINCIPAL FACTOR METHOD 117 Step 2. We calculate the square and symmetric matrix Q i = S ψ b i Q i = = Step 3. Spectral decomposition of Q i and separation into two terms H 1i G 1i H 0 1iand H 2i G 2i H 0 2i. The eigenvalues of Q i are 0.379, 0.094, and We observe that one of them is negative, thus the matrix is not positive definite. Since there is an eigenvalue that is much larger that the rest we will take a single factor. This implies the decomposition = Step 4. We calculate Λ b i+1 = H 1i G 1/2 1i bλ i+1 = = This is the first estimation of the loading matrix. We are going to iterate to improve this estimation and to do this, we return to Step 1. Step 1. We estimate the terms in the diagonal of ψ b i using ψ b i = diag ³S Λ b Λ b 0 bψ i = diag = Step 2. We calculate the square, symmetric matrix Q i = S b ψ i Q i = =

12 118 CAPÍTULO 12. FACTOR ANALYSIS Step 3. Spectral decomposition of Q i = H 1i G 1i H 0 1i + H 2i G 2i H 0 2i Step 4. We calculate Λ b i+1 = H 1i G 1/2 1i bλ i+1 = = = and check whether the convergence criterion is fulfilled, kλ n+1 Λ n k <² =0.106 ² = we return to Step 1 until the criterion is fulfilled. Step 1. We estimate again ψ b i = diag ³S Λ b Λ b 0 bψ i = diag = Step 2. We calculate the square symmetric matrix Q i = S ψ b i Q i = = Step 3. Spectral decomposition of Q i. We indicate only the first eigenvector and eigenvalue = H 2i G 2i H 0 2i

13 12.3. THE PRINCIPAL FACTOR METHOD 119 Step 4. Calculate bλ i+1 = H 1i G 1/2 1i bλ i+1 = = is: we check whether the convergence criterion is fulfilled =0.05 ² = ˆΛ n+1 ˆΛ n <². The convergence criterion has been fulfilled and the model with the estimated parameters u 1 u 2 u 3 x = N f , 0 u 1 u 2 u We see that the equation of the factor obtained is quite different from the first component that was obtained in exercise 5.1. Example: For the INVEST database a descriptive analysis was carried out in Chapter 4 in which a logarithmic transformation for all the variables and the elimination of the US data was proposed. Using this set of data, once it has been standardized, we are going to illustrate the calculations of a single factor via the principal factor method (in the following example 2 factors are considered). We are going to compare the two proposed methods to initiate the algorithm with the standardized data. In the firstcasewestarttheiterations with bψ j =0= b h 2 (0) =1, and the number of iterations needed before converging is 6. The stop criterion in step k of the algorithm is, in this case, that the maximum differences between the commonalities in k and k 1 be less than The following table shows the estimations of the commonalities for steps i =0, 1, 2, 3, 6. ĥ 2 (0) ĥ 2 (1) ĥ 2 (2) ĥ 2 (3) ĥ 2 (6) INTER.A INTER.B AGRIC BIOLO MEDIC CHEMIS ENGIN PHYSIC

14 120 CAPÍTULO 12. FACTOR ANALYSIS The final result once the algorithm has converged is shown in bold. If we begin the algorithm with the second method, bψ j =1 R 2 j = b h 2 (0) = R 2 j, thenumberofiterationsbeforeconvergenceis5. Thefollowingtableshowshowtheestimations of the commonalities vary for steps i =0, 1, 2, 3, 5. ĥ 2 (0) ĥ 2 (1) ĥ 2 (2) ĥ 2 (3) ĥ 2 (5) INTER.A INTER.B AGRIC BIOLO MEDIC CHEMIS ENGIN PHYSIC The figuresinboldshowthefinal result once convergence of the algorithm has been reached. Having initiated the algorithm in the point nearest the end, the convergence was faster, and in the second iteration the result is quite close to the end. We can see how the initial estimation of the commonalities, ĥ2 (0), is the upper bound of the final estimation, ĥ2 (5). In the following table we present the estimation of ˆΛ (0) which we started from in both methods and the estimation of the final loadings obtained. bψ j =0 ψj b =1 Rj 2 Final Factor1 Factor1 Factor1 INTER.A INTER.B AGRIC BIOLO MEDIC CHEMIS ENGIN PHYSIC The second method gives us a ˆΛ (0) closer to the final result, especially for those variables where the specific variability is greater Generalizations The principal factor method is a procedure for minimizing the function: F = tr (S ΛΛ 0 ψ) 2. (12.17)

15 12.3. THE PRINCIPAL FACTOR METHOD 121 Note that this function can be written as px F = i=1 px (s ij v ij ) 2 (12.18) j=1 where v ij are the elements of the matrix V = ΛΛ 0 + ψ. However, using spectral decomposition, given a squared symmetric and non-negative matrix S, the best approximation as far as least squares (12.18) using a matrix of rank m, AA 0 is obtained by taking A = HD 1/2, where H contains the eigenvectors and D 1/2 the roots of the eigenvalues of S (see Appendix 5.2), which is what the principal factor method does. Harman (1976) developed the MINRES algorithm which minimizes (12.17) more efficiently than the principal factor method, and Joreskog (1976) proposed the ULS (unweighted least squares), which is based on taking the derivative in (12.17), obtaining Λ b as a function of ψ and then minimizing the resulting function with a Newton-Raphson type non-linear algorithm. Example: With the INVEST data, used in the previous example, we present the factorial analysis for two factors performed with a computer program using the principal factor method. Table 12.1 indicates the variability of both factors. The second factor explains little variability (2%) but has been included because of its clear interpretation. Factor1 Factor2 Variability P h P h i=1 P h Tabla 12.1: Variability explained by the first two factors estimated using the principal factor method. The principal factor algorithm begins with ψ b j =1 Rj, 2 and 14 iterations have been performed before converging at the weights presented in Table Factor1 Factor2 ψ 2 i INTER.A INTER.B AGRIC BIOLO MEDIC CHEMIS ENGIN PHYSIC Tabla 12.2: Loading matrix of the factors and communalities The first factor is the sum of the publications in all the databases, and it gives an idea of the volume. According to this factor the countries would be ordered according to their scientific output. The second factor contrasts biomedical research with technological

16 122 CAPÍTULO 12. FACTOR ANALYSIS research. This second component separates Japan and the UK, countries with heavy scientific output. Figure 12.1 shows a graph of the distribution of the countries over these two factors. The reader should compare these results with those obtained in Chapter 5 (exercises 5.6 and 5.10) with principal components. Figura 12.1: Representation of the countries on the plane formed by the first two factors MAXIMUM LIKELIHOOD ESTIMATION ML estimation of parameters Direct approach The matrices of the parameters can be formally estimated using maximum likelihood. The density function of the original observations is N p (μ, V). Therefore, the likelihood is that which we saw in Chapter 10. Replacing μ with its estimator, x, the support function for V is: log(v X) = n 2 log V n 2 tr SV 1, (12.19) and replacing V with (12.2) the support function of Λ and ψ is : L(Λ, ψ) = n log ΛΛ 0 + ψ + tr(s(λλ 0 + ψ) 1. (12.20) 2

17 12.4. MAXIMUM LIKELIHOOD ESTIMATION 123 The maximum likelihood estimators are obtained by maximizing (12.20) with respect to the matrices Λ and ψ. Taking the derivative with respect to these matrices and after certain algebraic manipulations which are shown in Appendix 12.1, (see Anderson, 1984, pp ; or Lawley and Maxwell, 1971), the following equations are obtained: where D is the normalization matrix bψ = diag (S ˆΛˆΛ 0 ) (12.21) ³ bψ 1/2 (S I) ψ b 1/2 ³ ³ ψ b Λ 1/2 b = bψ Λ 1/2 b D (12.22) bλ 0 b ψ 1 b Λ = D =diagonal. (12.23) These three equations allow us to solve the system by a Newton-Raphson type iterative algorithm. The numerical solution is difficult to find because there may not be a solution whereby b ψ is a positive definite, and it is then necessary to turn to the estimation with restrictions. We observe that (12.22) leads to an equation of eigenvalues: it tells us that bψ 1/2 b Λ contains the eigenvectors of the symmetric matrix ³ bψ 1/2 (S I) b ψ 1/2 and that D contains the eigenvalues. The iterative algorithm for solving these equations is: 1. Start with an initial estimation. If we have an estimation Λ b i,(i =1the first time), from the principal factor method, for example, the matrix ψ b i is calculated using bψ i = diag ³S Λ b iλ b 0i. Alternatively, we can estimate the matrix ψ b i directly as in the principal factor method. 2. The squared symmetric matrix A i is calculated as A i = ψ b 1/2 i ³S ψ i b 1/2 bψ i = bψ 1/2 Sψ b 1/2 I. This matrix weights the terms of S by their importance in terms of specific components. 3. The spectral decomposition of A i is obtained such that A i = H 1i G 1i H 0 1i + H 2i G 2i H 0 2i where the m greatest eigenvalues of A i are in the (m m) diagonal matrix, G 1i and the p m smallest of G 2i and H 1i and H 2i contain the corresponding eigenvectors. 4. Take Λ b i+1 = ψ b 1/2 i H 1i G 1/2 1i and substitute it in the likelihood function, which is maximized with respect to ψ. This part is easy to do with an algorithm of non-linear optimization. With this result, go back to (2), iterating until convergence is reached. It is possible for this algorithm to converge at a maximum point where some of the terms of the matrix ψ are negative. This inadequate solution is sometimes called a Heywood solution. The existing programs change these values for positive numbers and attempt to find another maximum point, although the algorithm does not always converge.

18 124 CAPÍTULO 12. FACTOR ANALYSIS Appendix 12.1 shows the proof that the ML estimation is invariant to linear transformations of the variables. Therefore, the result of the estimation does not depend - as happens with principal components - on the use of the covariance or correlation matrix. An additional advantage of maximum likelihood is that we can obtain asymptotic variances of the estimators using the information matrix in the optimum. We see that when the diagonal terms of matrix b ψ are approximately equal, the ML estimation will lead to results similar to those of the principal factor method. Note that by substituting b ψ = ki in the the ML estimator equations, both methods use the same normalization and equation (12.22) is analogous to (12.11), which is solved in the principal factor method. The EM algorithm An alternative procedure for maximizing the likelihood is to consider the factors as missing values and to apply the EM algorithm. The joint likelihood function of the data and the factors can be written as f(x 1,...,x n, f 1,..., f n )=f(x 1,...,x n f 1,..., f n ) f(f 1,..., f n ). The support for the whole sample is log(ψ, Λ X, F) = n 2 log ψ 1 X (xi Λf i ) 0 ψ 1 (x i Λf i ) 1 X fi f i, (12.24) where we assume that the mean of variables x i is zero, which is equivalent to substituting the mean with its sample mean estimator. We observe that, given the factors, the estimation of Λ could be done as a regression. On the other hand, given the parameters we could estimate the factors, as we will see in section In order to apply the EM algorithm we need: (1) Step M: maximize the complete likelihood with respect to Λ and ψ, assuming that the f i values of the factors are known. This is easy to do since the rows of Λ are obtained by doing regressions between each variable and the factors, and the diagonal elements of ψ are the residual variances in these regressions. (2) Step E: Calculate the expectation of the complete likelihood with respect to the distribution of the f i given the parameters. Developing (12.24) it can be shown that the expressions which appear in the likelihood are the covariance matrix between the factors and the covariance matrix between the factors and the data. The details of this estimation can be seen in Bartholomew and Knott (1999, p.49) Other estimation methods Because the maximum likelihood method is complicated, other approximate methods have been proposed for calculating estimators with similar asymptotic properties but with simpler calculations. One of these is the generalized least squares which we will present next. To justify it, we observe that the ML estimation can be reinterpreted as follows: if no restrictions exist for V the ML estimator of this matrix is S and substituting this estimation in (12.19) the support function in the maximum is: n 2 log S n 2 p.

19 12.4. MAXIMUM LIKELIHOOD ESTIMATION 125 Maximizing the support function is equivalent to minimizing with respect to V the discrepancy function obtained by subtracting the support from the previous maximum value (12.19). The function obtained with this difference is: F = n 2 tr SV 1 n p log SV 1 2 which indicates that we want to make V as close to S as possible, measuring the distance between both matrices by the trace and the determinant of the product SV 1. We observe that since V is estimated with restrictions SV 1 1 and the logarithm will be negative or null. If we concentrate on the first two terms and reject the determinant, the function to be minimized is: F 1 = tr SV 1 p = tr SV 1 I = tr (S V) V 1 which computes the differences between the observed matrix S and the estimated V, but gives a weight to each difference which depends on the size of V 1. This leads to the idea of generalized least squares (GLS), where we minimize tr (S V) V 1 2 and it can be proved that if we iterate the GLS procedure we obtain efficient asymptotic estimators. Example: We are going to illustrate the ML estimation for the INVEST data. Assuming two factors we obtain the results shown in the following tables: Factor1 Factor2 Variability P h P h i=1 P h Tabla 12.3: Variability explained by the first two factors estimated using maximum likelihood. Factor1 Factor2 ψ 2 i INTER.A INTER.B AGRIC BIOLO MEDIC CHEMIS ENGIN PHYSIC Tabla 12.4: Loading matrix of the factors

20 126 CAPÍTULO 12. FACTOR ANALYSIS If we compare these results with those obtained using the principal factor method (exercise 12.5) we see that the first factor is similar, although it increases the weight of physics and there are relative differences between the weights of the variables. The second factor has more changes but its interpretation is similar as well. The variances of the specific components display few changes in both approaches. Figures 12.2 and 12.3 show the weights and the projection of the data over the plane of the factors. Figura 12.2: Weights of the INVEST variables in the two factors using ML estimation DETERMINING THE NUMBER OF FACTORS Likelihood test Suppose that a model with m factors has been estimated. The test to see whether the decomposition holds can be considered as a likelihood ratio test: H 0 : V = ΛΛ 0 +ψ H 1 : V 6= ΛΛ 0 +ψ. This test is similar to the partial sphericity which we studied in Chapter 10, although there are differences due to the fact that we do not require the specific componentstohave equal variance. Let ˆV 0 be the value of the covariance matrix of the data estimated under H 0. Then, the likelihood ratio test is: λ =2(ln(H 1 ) ln(h 0 ))

21 12.5. DETERMINING THE NUMBER OF FACTORS 127 Figura 12.3: Projection of the countries onto the plane of the two factors using ML estimation. By (12.19), the likelihood function under H 0 is: ln(h 0 )= n 2 log V0 b n ³ 2 tr SV b 0 1 (12.25) whereas under H 1 the estimator of V is S and we have: ln(h 1 )= n 2 log S n 2 tr(ss 1 )= n np log S 2 2, (12.26) the likelihood ratio can be written using these two expressions: λ = n(log b V 0 + tr(s b V 1 0 ) log S p) It can be proved (Appendix 12.1), that the ML estimator of V under H 0, ˆV 0, minimizes the distance to S measured with the trace, that is: ³ tr SV b 0 1 = p (12.27) therefore, the likelihood ratio is λ = n log V b 0 (12.28) S and, thus, measures the distance between ˆV 0 and S in terms of the determinant, n log SV b 0 1, which is the second term of the likelihood.

22 128 CAPÍTULO 12. FACTOR ANALYSIS The test rejects H 0 when λ is greater than the percentile 1 α of a distribution χ 2 g with g degrees of freedom, µ given by g = dim(h 1 ) dim(h 0 ). The dimension of the parametric p space of H 1 is p + = p(p +1)/2, equaltothenumberofdifferent elements in V. 2 The dimension of H 0 is pm bymatrixλ plusthep elements of ψ, minusm(m 1)/2 restrictions resulting from the condition that Λ 0 ψ 1 Λ must be diagonal. Therefore: g = p + p(p 1)/2 pm p + m(m 1)/2 = (12.29) = (1/2) (p m) 2 (p + m) Bartlett (1954) showed that the asymptotic approximation of the χ 2 distribution improves in finite samples by introducing a correction factor. With this modification, the test rejects H 0 if (n 1 2p +4m +5 )ln ˆΛˆΛ 0 + ˆψ 6 S > χ 2 [((p m) 2 (p+m))/2] (1 α) (12.30) Generally, this test is applied sequentially: The model is estimated with a small value, m = m 1 (which can be m 1 =1)andH 0 is tested. If it is rejected, we re-estimate with m = m 1 +1, continuing until H 0 is accepted. An alternative procedure proposed by Joreskog (1993), which works better against moderate deviations from normality is the following: calculate the statistic (12.30) for m = 1,...,m max.letx 2 1,...,X 2 m max be their values, and g 1,...,g mmax their degrees of freedom. We calculate the differences X 2 m X 2 m+1 and consider these differences to be values of a χ 2 with g m g m+1 degrees of freedom. If the value obtained is significant we increase the number of factors and proceed in this way until we cannot find a significant improvement in the fit ofthemodel. The test (12.28) allows for an interesting interpretation. The factor model establishes that the difference between the covariance matrix, S (p p), and a diagonal matrix of rank p, ψ, is approximately a symmetric matrix of rank m, ΛΛ 0,thatis: S b ψ ' b Λ b Λ 0. Premultiplying and postmultiplying by b ψ 1/2 wegetthatmatrixa, givenby: must be asymptotically equal to the matrix: A = b ψ 1/2 S b ψ 1/2 I, (12.31) B = b ψ 1/2 b Λ b Λ 0 b ψ 1/2, (12.32) and have rank m asymptotically, instead of rank p. It is shown in Appendix 12.2 that the test (12.28) is equivalent to checking if the matrix A has rank m, which must be asymptotically true by (12.32), and the test (12.28) can be written as λ = n px log(1 + d i ) (12.33) m+1

23 12.5. DETERMINING THE NUMBER OF FACTORS 129 where d i are the p m smaller eigenvalues of the matrix A. The null hypothesis is rejected if λ is too large compared to the χ 2 distribution with (1/2)((p m) 2 p m). In Appendix 12.2 it is shown that this test is a particular case of the likelihood test over the partial sphericity of a matrix that we presented in Whenthesamplesizeislargeandm is small compared to p, if the data do not follow a multivariate normal distribution then the test generally leads to a rejection of H 0. This is a frequent problem in hypothesis testing for large samples where we tend to reject H 0. Therefore, it is necessary when it comes to deciding on the number of factors, to differentiate between practical significance and statistical significance, just as with any hypothesis test. This test is very sensitive to deviations from normality so in practice, the statistic (12.28) is used more as a measure of the fit of the model rather than as a formal test Selection criteria An alternative to selecting the number of factors by a test is to consider the problem as a choice of models. We then estimate the factor model for different numbers of factors, calculate the support function in the maximum for each model and, applying the Akaike criterion, we choose the model where AIC(m) = 2L(H 0,m )+2n p is minimum. In this expression 2L(H 0,m ) is the support function for the model which establishes m factors particularized in the ML estimators, which are given by (12.25), and n p is the number of parameters in the model. We observe that this expression takes into account that by increasing m the likelihood of L(H 0,m ) increases, or the deviance 2L(H 0,m ) decreases, but this effect is balanced by the number of parameters, which appear penalizing the above equation. Using the likelihood equation (12.20) and the condition (12.27), since n p = mp + p m(m 1)/2, we have AIC(m) =n log ˆΛˆΛ 0 + ˆψ + np 2m [p + p/m (m 1)/2] The AIC criterion can be described as minimizing the differences AIC(m) AIC(H 1 ), whereinallthemodelswesubtractthesamequantity,aic(h 1 ), the value of AIC for the model which assumes there is no factor structure and estimates the covariance matrix without restrictions. Then the function to minimize is AIC (m) =2(L(H 1 ) L(H 0,m )) 2g = λ(m) 2g where λ(m) is the difference of the supports (12.28), where in this expression bv 0 is estimated with m factors, and g is the number of degrees of freedom given by (12.30) (12.29). An alternative criterion is the BIC which we saw in Chapter 11. With this criterion, instead of penalizing the number of parameters with 2 we do it with logn. This criterion applied to the factor model using the differences of the support is: BIC(m) =λ(m) g log n

24 130 CAPÍTULO 12. FACTOR ANALYSIS Example: We apply the maximum likelihood method to the INVEST data to perform a test on the number of factors. If we base the test on the equation (12.28) we obtain the following table, m λ g m p value AIC BIC , 73 for example, if m =1the number of degrees of freedom is (1/2) ((7) 2 (9)) = 20. We see that for α = 0.05 we cannot reject the null hypothesis that one factor is sufficient. Nevertheless, the Akaike criterion indicates that the minimum is obtained with two factors, and the BIC criterion confirms, with little difference, the choice of one factor. Since the p value of the test is at the limit we are going to compare this test with the procedure proposed by Joreskog. The first step is to use the correction proposed by Bartlett, which we carry out multiplying the statistics χ 2 m by (n 1 (2p+4m+5)/6)/n. For example, the corrected statistic for p =1is, X 2 1 = ((20 1 ( )/6)/20) 31.1 =23.06 and the following table shows the results. p Xm 2 Xm 2 Xm+1 2 g m g m+1 p value This method indicates that we reject the hypothesis of one factor, but we cannot reject the hypothesis of two factors, thus we conclude that the number of factors, chosen with Joreskog s method, is equal to two. As we see, in this example Joreskog s criterion coincides with that of Akaike s. Example: For the HBS data from the Household Budget Survey in Annex??, weapply the factor model estimated by maximum likelihood. The data have been transformed into logarithms to improve the asymmetry, the same as was done in the Principal Components analysis presented in examples 4.2 and 4.3. For this analysis we have also standardized the observations. We accept the test of a single factor given that the p-value is The estimation of the weights of this factor is approximately a weighting with less weight given to the sections of food, clothing and footwear, as shown in Table 12.5 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 Factor Tabla 12.5: Loading vector of the factors

25 12.6. FACTOR ROTATION FACTOR ROTATION As we saw in section , the loading matrix is not identified through multiplying by orthogonal matrices, which are equivalent to rotations. In factor analysis the space of the columns in the loading matrix is defined, but any base in this space can be a solution. In order to choose from among possible solutions, the interpretation of the factors is taken into account. Intuitively, it is easier to interpret a factor when it is associated with a block of observed variables. This occurs if the columns of the loading matrix, which represent the effect of each factor on the observed variables, contain high values for certain variables and small ones for others. This idea can be considered in different ways which give rise to different criteria for defining the rotation. The coefficients of the orthogonal matrix which define the rotation are obtained by minimizing an objective function. This expresses the desired simplicity in the representation which we get as a result of the rotation. The most frequently used criterion is the Varimax. Varimax Criteria The interpretation of the factors is made easier if those which affect some variables do not affect others and vice versa. This objective leads to the criterion of maximizing the variance of the coefficients which define the effects of each factor on the observed variables. In order to specify this criterion, we let δ ij be the coefficients of the loading matrix associated with the factor j in the i =1,...,p equations after the rotation and δ j is the vector which is the column j of the loading matrix after the rotation. We want the variance of the squared coefficients of this vector to be maximum. The squared coefficients are taken in order to eliminate the signs since we are interested in their absolute value. Letting δ.j = P δ 2 ij/p be the mean of the squares of the components of the vector δ j, the variability for factor j is: 1 p px (δ 2 ij δ.j ) 2 = 1 p i=1 px px δ 4 ij (1/p) 2 ( δ 2 ij) 2, (12.34) i=1 i=1 and the criterion is to maximize the sum of the variances for all of the factors given by: q =(1/p) mx j=1 px δ 4 ij (1/p) 2 i=1 m X j=1 ( px δ 2 ij) 2. (12.35) Let Λ be the loading matrix estimated initially. The problem is to find an orthogonal matrix M such that the matrix δ given by and whose coefficients δ ij are given by δ = ΛM, δ ij = λ 0 i m j where λ 0 i is the row i of the matrix Λ and m j is the column j of the matrix M we are looking for, verifies the condition that these coefficients maximize (12.35). The terms of the matrix i=1

26 132 CAPÍTULO 12. FACTOR ANALYSIS M are derived from (12.35) for each of the terms m ij taking into account the restrictions of orthogonality m 0 im i =1; m 0 im j =0(i 6= j). The result is the varimax rotation. Example: If we apply a varimax rotation to the ML estimation of the INVEST data from example 7.6 we obtain the result shown in Figure This new loading matrix is the result Figura 12.4: The result of applying a varimax rotation to the INVEST factors. of multiplying the coefficients of the orthogonal matrix that defines the rotation, M, by the loading matrix obtained through the ML estimation and shown in example 12.6 δ = Λ M = Oblique rotations The factor model is indeterminate not only to orthogonal rotations but to oblique rotations aswell. Infact,aswesawinsection6.1themodelcanbeestablishedwithcorrelatedor uncorrelated factors. The solution obtained from the estimation of Λ always corresponds to uncorrelated factors, but we can ask ourselves if there is a solution with correlated factors

27 12.7. FACTOR ESTIMATION 133 which has a more interesting interpretation. Mathematically this implies the definition of new factors f = Hf, whereh is a nonsingular matrix which can, in general, be interpreted as an oblique rotation. The new covariance matrix of the factors is V f = HH0. There are various procedures for obtaining oblique rotations, such as Quartmin, Oblimax, Promax, etc. more information on which can be found in specialized literature. The problem with oblique rotations is that the factors are correlated and thus cannot be interpreted independently FACTOR ESTIMATION In many problems the interest of factor analysis lies in determining the loading matrix, not in the particular values of the factors in the elements of the sample. Nevertheless, in other cases we want to obtain the values of the factors over the observed elements. There are two procedures for estimating the factors: the first, introduced by Bartlett, supposes that the vector of the values of the factors for each observation is a parameter to be estimated. The second supposes that this vector is a random variable. Next we will briefly reviewboth procedures The factors as parameters The (p 1) vector of the values of the variables in the individual, i, x i, has a normal distribution with mean Λf i, where f i is the (m 1) vector of the factors for the element i in the sample, and the covariance matrix ψ, x i N p (Λf i, ψ) The parameters f i can be estimated by maximum likelihood as shown in Appendix The resulting estimator is that of generalized least squares, given by b fi = ³ bλ 0 b ψ 1 b Λ 1 bλ 0 b ψ 1 xi. (12.36) which has a clearly intuitive interpretation: if we know Λ, the factor model x i = Λf i + u i is a regression model with dependent variable x i, explicative variables in the columns of Λ and parameters f i. Since the perturbation, u i, is not distributed as N(0, I) but as N(0, ψ), we have to use generalized least squares, which leads to (12.36) The factors as random variables The second method is to assume that the factors are random variables, and to look for a linear predictor which minimizes the mean square error of prediction. As before, we let f i be the values of the factors in the individual i and x i is the vector of observed variables,

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Factor Analysis Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA 1 Factor Models The multivariate regression model Y = XB +U expresses each row Y i R p as a linear combination

More information

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis TAMS39 Lecture 10 Principal Component Analysis Factor Analysis Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content - Lecture Principal component analysis

More information

Factor analysis. George Balabanis

Factor analysis. George Balabanis Factor analysis George Balabanis Key Concepts and Terms Deviation. A deviation is a value minus its mean: x - mean x Variance is a measure of how spread out a distribution is. It is computed as the average

More information

STAT 730 Chapter 9: Factor analysis

STAT 730 Chapter 9: Factor analysis STAT 730 Chapter 9: Factor analysis Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Data Analysis 1 / 15 Basic idea Factor analysis attempts to explain the

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 4: Factor analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering Pedro

More information

9.1 Orthogonal factor model.

9.1 Orthogonal factor model. 36 Chapter 9 Factor Analysis Factor analysis may be viewed as a refinement of the principal component analysis The objective is, like the PC analysis, to describe the relevant variables in study in terms

More information

Introduction to Factor Analysis

Introduction to Factor Analysis to Factor Analysis Lecture 10 August 2, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #10-8/3/2011 Slide 1 of 55 Today s Lecture Factor Analysis Today s Lecture Exploratory

More information

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables 1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables

More information

Applied Multivariate Analysis

Applied Multivariate Analysis Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Dimension reduction Exploratory (EFA) Background While the motivation in PCA is to replace the original (correlated) variables

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Introduction to Factor Analysis

Introduction to Factor Analysis to Factor Analysis Lecture 11 November 2, 2005 Multivariate Analysis Lecture #11-11/2/2005 Slide 1 of 58 Today s Lecture Factor Analysis. Today s Lecture Exploratory factor analysis (EFA). Confirmatory

More information

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION

FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION FACTOR ANALYSIS AS MATRIX DECOMPOSITION JAN DE LEEUW ABSTRACT. Meet the abstract. This is the abstract. 1. INTRODUCTION Suppose we have n measurements on each of taking m variables. Collect these measurements

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

Regularized Common Factor Analysis

Regularized Common Factor Analysis New Trends in Psychometrics 1 Regularized Common Factor Analysis Sunho Jung 1 and Yoshio Takane 1 (1) Department of Psychology, McGill University, 1205 Dr. Penfield Avenue, Montreal, QC, H3A 1B1, Canada

More information

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

Factor Analysis Edpsy/Soc 584 & Psych 594

Factor Analysis Edpsy/Soc 584 & Psych 594 Factor Analysis Edpsy/Soc 584 & Psych 594 Carolyn J. Anderson University of Illinois, Urbana-Champaign April 29, 2009 1 / 52 Rotation Assessing Fit to Data (one common factor model) common factors Assessment

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis Multivariate Fundamentals: Rotation Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA Analysis with Spatial Data Proportion of variance explained Comp.1 + Comp.2

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

Principal Component Analysis & Factor Analysis. Psych 818 DeShon

Principal Component Analysis & Factor Analysis. Psych 818 DeShon Principal Component Analysis & Factor Analysis Psych 818 DeShon Purpose Both are used to reduce the dimensionality of correlated measurements Can be used in a purely exploratory fashion to investigate

More information

Penalized varimax. Abstract

Penalized varimax. Abstract Penalized varimax 1 Penalized varimax Nickolay T. Trendafilov and Doyo Gragn Department of Mathematics and Statistics, The Open University, Walton Hall, Milton Keynes MK7 6AA, UK Abstract A common weakness

More information

Theorems. Least squares regression

Theorems. Least squares regression Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Linear Model Under General Variance

Linear Model Under General Variance Linear Model Under General Variance We have a sample of T random variables y 1, y 2,, y T, satisfying the linear model Y = X β + e, where Y = (y 1,, y T )' is a (T 1) vector of random variables, X = (T

More information

Sparse orthogonal factor analysis

Sparse orthogonal factor analysis Sparse orthogonal factor analysis Kohei Adachi and Nickolay T. Trendafilov Abstract A sparse orthogonal factor analysis procedure is proposed for estimating the optimal solution with sparse loadings. In

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University Topic 4 Unit Roots Gerald P. Dwyer Clemson University February 2016 Outline 1 Unit Roots Introduction Trend and Difference Stationary Autocorrelations of Series That Have Deterministic or Stochastic Trends

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Factor Analysis. Qian-Li Xue

Factor Analysis. Qian-Li Xue Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 7, 06 Well-used latent variable models Latent variable scale

More information

UCLA STAT 233 Statistical Methods in Biomedical Imaging

UCLA STAT 233 Statistical Methods in Biomedical Imaging UCLA STAT 233 Statistical Methods in Biomedical Imaging Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology University of California, Los Angeles, Spring 2004 http://www.stat.ucla.edu/~dinov/

More information

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II the Contents the the the Independence The independence between variables x and y can be tested using.

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Analysis of variance, multivariate (MANOVA)

Analysis of variance, multivariate (MANOVA) Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables

More information

The Common Factor Model. Measurement Methods Lecture 15 Chapter 9

The Common Factor Model. Measurement Methods Lecture 15 Chapter 9 The Common Factor Model Measurement Methods Lecture 15 Chapter 9 Today s Class Common Factor Model Multiple factors with a single test ML Estimation Methods New fit indices because of ML Estimation method

More information

ABOUT PRINCIPAL COMPONENTS UNDER SINGULARITY

ABOUT PRINCIPAL COMPONENTS UNDER SINGULARITY ABOUT PRINCIPAL COMPONENTS UNDER SINGULARITY José A. Díaz-García and Raúl Alberto Pérez-Agamez Comunicación Técnica No I-05-11/08-09-005 (PE/CIMAT) About principal components under singularity José A.

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

The 3 Indeterminacies of Common Factor Analysis

The 3 Indeterminacies of Common Factor Analysis The 3 Indeterminacies of Common Factor Analysis James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) The 3 Indeterminacies of Common

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Canonical Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Canonical Slide

More information

Forecast comparison of principal component regression and principal covariate regression

Forecast comparison of principal component regression and principal covariate regression Forecast comparison of principal component regression and principal covariate regression Christiaan Heij, Patrick J.F. Groenen, Dick J. van Dijk Econometric Institute, Erasmus University Rotterdam Econometric

More information

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test. Economics 52 Econometrics Professor N.M. Kiefer LECTURE 1: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING NEYMAN-PEARSON LEMMA: Lesson: Good tests are based on the likelihood ratio. The proof is easy in the

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental

More information

Appendix A: The time series behavior of employment growth

Appendix A: The time series behavior of employment growth Unpublished appendices from The Relationship between Firm Size and Firm Growth in the U.S. Manufacturing Sector Bronwyn H. Hall Journal of Industrial Economics 35 (June 987): 583-606. Appendix A: The time

More information

Principal Component Analysis (PCA) Theory, Practice, and Examples

Principal Component Analysis (PCA) Theory, Practice, and Examples Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

Exploratory Factor Analysis: dimensionality and factor scores. Psychology 588: Covariance structure and factor models

Exploratory Factor Analysis: dimensionality and factor scores. Psychology 588: Covariance structure and factor models Exploratory Factor Analysis: dimensionality and factor scores Psychology 588: Covariance structure and factor models How many PCs to retain 2 Unlike confirmatory FA, the number of factors to extract is

More information

FEEG6017 lecture: Akaike's information criterion; model reduction. Brendan Neville

FEEG6017 lecture: Akaike's information criterion; model reduction. Brendan Neville FEEG6017 lecture: Akaike's information criterion; model reduction Brendan Neville bjn1c13@ecs.soton.ac.uk Occam's razor William of Occam, 1288-1348. All else being equal, the simplest explanation is the

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information

3.1. The probabilistic view of the principal component analysis.

3.1. The probabilistic view of the principal component analysis. 301 Chapter 3 Principal Components and Statistical Factor Models This chapter of introduces the principal component analysis (PCA), briefly reviews statistical factor models PCA is among the most popular

More information

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 What is factor analysis? What are factors? Representing factors Graphs and equations Extracting factors Methods and criteria Interpreting

More information

Key Algebraic Results in Linear Regression

Key Algebraic Results in Linear Regression Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in

More information

Appendix A: Matrices

Appendix A: Matrices Appendix A: Matrices A matrix is a rectangular array of numbers Such arrays have rows and columns The numbers of rows and columns are referred to as the dimensions of a matrix A matrix with, say, 5 rows

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Retained-Components Factor Transformation: Factor Loadings and Factor Score Predictors in the Column Space of Retained Components

Retained-Components Factor Transformation: Factor Loadings and Factor Score Predictors in the Column Space of Retained Components Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 6 11-2014 Retained-Components Factor Transformation: Factor Loadings and Factor Score Predictors in the Column Space of Retained

More information

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction NUCLEAR NORM PENALIZED ESTIMATION OF IERACTIVE FIXED EFFECT MODELS HYUNGSIK ROGER MOON AND MARTIN WEIDNER Incomplete and Work in Progress. Introduction Interactive fixed effects panel regression models

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8] 1 Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8] Insights: Price movements in one market can spread easily and instantly to another market [economic globalization and internet

More information

Psychology 454: Latent Variable Modeling How do you know if a model works?

Psychology 454: Latent Variable Modeling How do you know if a model works? Psychology 454: Latent Variable Modeling How do you know if a model works? William Revelle Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 18 Outline 1 Goodness

More information

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables /4/04 Structural Equation Modeling and Confirmatory Factor Analysis Advanced Statistics for Researchers Session 3 Dr. Chris Rakes Website: http://csrakes.yolasite.com Email: Rakes@umbc.edu Twitter: @RakesChris

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Dimensionality Reduction

Dimensionality Reduction 394 Chapter 11 Dimensionality Reduction There are many sources of data that can be viewed as a large matrix. We saw in Chapter 5 how the Web can be represented as a transition matrix. In Chapter 9, the

More information

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses. ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

CONFIRMATORY FACTOR ANALYSIS

CONFIRMATORY FACTOR ANALYSIS 1 CONFIRMATORY FACTOR ANALYSIS The purpose of confirmatory factor analysis (CFA) is to explain the pattern of associations among a set of observed variables in terms of a smaller number of underlying latent

More information

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation 1 Outline. 1. Motivation 2. SUR model 3. Simultaneous equations 4. Estimation 2 Motivation. In this chapter, we will study simultaneous systems of econometric equations. Systems of simultaneous equations

More information

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where: VAR Model (k-variate VAR(p model (in the Reduced Form: where: Y t = A + B 1 Y t-1 + B 2 Y t-2 + + B p Y t-p + ε t Y t = (y 1t, y 2t,, y kt : a (k x 1 vector of time series variables A: a (k x 1 vector

More information

Factor Analysis of Data Matrices

Factor Analysis of Data Matrices Factor Analysis of Data Matrices PAUL HORST University of Washington HOLT, RINEHART AND WINSTON, INC. New York Chicago San Francisco Toronto London Contents Preface PART I. Introductory Background 1. The

More information

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson More Linear Algebra Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois

More information

LEAST SQUARES METHODS FOR FACTOR ANALYSIS. 1. Introduction

LEAST SQUARES METHODS FOR FACTOR ANALYSIS. 1. Introduction LEAST SQUARES METHODS FOR FACTOR ANALYSIS JAN DE LEEUW AND JIA CHEN Abstract. Meet the abstract. This is the abstract. 1. Introduction Suppose we have n measurements on each of m variables. Collect these

More information

This section is an introduction to the basic themes of the course.

This section is an introduction to the basic themes of the course. Chapter 1 Matrices and Graphs 1.1 The Adjacency Matrix This section is an introduction to the basic themes of the course. Definition 1.1.1. A simple undirected graph G = (V, E) consists of a non-empty

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Method of principal factors estimation of optimal number of factors: an information criteria approach

Method of principal factors estimation of optimal number of factors: an information criteria approach American Journal of Theoretical and Applied Statistics 2013; 2(6): 166-175 Published online October 30, 2013 (http://www.sciencepublishinggroup.com/j/ajtas) doi: 10.11648/j.ajtas.20130206.13 Method of

More information

Lecture Stat Information Criterion

Lecture Stat Information Criterion Lecture Stat 461-561 Information Criterion Arnaud Doucet February 2008 Arnaud Doucet () February 2008 1 / 34 Review of Maximum Likelihood Approach We have data X i i.i.d. g (x). We model the distribution

More information

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63 1 / 63 Panel Data Models Chapter 5 Financial Econometrics Michael Hauser WS17/18 2 / 63 Content Data structures: Times series, cross sectional, panel data, pooled data Static linear panel data models:

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Seminar on Linear Algebra

Seminar on Linear Algebra Supplement Seminar on Linear Algebra Projection, Singular Value Decomposition, Pseudoinverse Kenichi Kanatani Kyoritsu Shuppan Co., Ltd. Contents 1 Linear Space and Projection 1 1.1 Expression of Linear

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

CHAPTER 7 INTRODUCTION TO EXPLORATORY FACTOR ANALYSIS. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum

CHAPTER 7 INTRODUCTION TO EXPLORATORY FACTOR ANALYSIS. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum CHAPTER 7 INTRODUCTION TO EXPLORATORY FACTOR ANALYSIS From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum 1997 144 CHAPTER 7 INTRODUCTION TO EXPLORATORY FACTOR ANALYSIS Factor analytic

More information

M M Cross-Over Designs

M M Cross-Over Designs Chapter 568 Cross-Over Designs Introduction This module calculates the power for an x cross-over design in which each subject receives a sequence of treatments and is measured at periods (or time points).

More information

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models Bare minimum on matrix algebra Psychology 588: Covariance structure and factor models Matrix multiplication 2 Consider three notations for linear combinations y11 y1 m x11 x 1p b11 b 1m y y x x b b n1

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Dimensionality Reduction Techniques (DRT)

Dimensionality Reduction Techniques (DRT) Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information