Essays on Estimation Methods for Factor Models and Structural Equation Models

Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Social Sciences 111 Essays on Estimation Methods for Factor Models and Structural Equation Models SHAOBO JIN ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2015 ISSN 1652-9030 ISBN 978-91-554-9199-4 urn:nbn:se:uu:diva-247292

Dissertation presented at Uppsala University to be publicly examined in Hörsal 2, Ekonomikum, Kyrkogårdsgatan 10, Uppsala, Friday, 8 May 2015 at 10:15 for the degree of Doctor of Philosophy. The examination will be conducted in English. Faculty examiner: Li Cai (UCLA). Abstract Jin, S. 2015. Essays on Estimation Methods for Factor Models and Structural Equation Models. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Social Sciences 111. 29 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9199-4. This thesis which consists of four papers is concerned with estimation methods in factor analysis and structural equation models. New estimation methods are proposed and investigated. In paper I an approximation of the penalized maximum likelihood (ML) is introduced to fit an exploratory factor analysis model. Approximated penalized ML continuously and efficiently shrinks the factor loadings towards zero. It naturally factorizes a covariance matrix or a correlation matrix. It is also applicable to an orthogonal or an oblique structure. Paper II, a simulation study, investigates the properties of approximated penalized ML with an orthogonal factor model. Different combinations of penalty terms and tuning parameter selection methods are examined. Differences in factorizing a covariance matrix and factorizing a correlation matrix are also explored. It is shown that the approximated penalized ML frequently improves the traditional estimation-rotation procedure. In Paper III we focus on pseudo ML for multi-group data. Data from different groups are pooled and normal theory is used to fit the model. It is shown that pseudo ML produces consistent estimators of factor loadings and that it is numerically easier than multi-group ML. In addition, normal theory is not applicable to estimate standard errors. A sandwich-type estimator of standard errors is derived. Paper IV examines properties of the recently proposed polychoric instrumental variable (PIV) estimators for ordinal data through a simulation study. PIV is compared with conventional estimation methods (unweighted least squares and diagonally weighted least squares). PIV produces accurate estimates of factor loadings and factor covariances in the correctly specified confirmatory factor analysis model and accurate estimates of loadings and coefficient matrices in the correctly specified structure equation model. If the model is misspecified, robustness of PIV depends on model complexity, underlying distribution, and instrumental variables. Keywords: shrinkage, factor rotation, penalized maximum likelihood, pseudo-maximum likelihood, multi-group analysis, ordinal data, robustness Shaobo Jin, Department of Statistics, Uppsala University, SE-75120 Uppsala, Sweden. Shaobo Jin 2015 ISSN 1652-9030 ISBN 978-91-554-9199-4 urn:nbn:se:uu:diva-247292 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-247292)

Dedicated to my parents

List of papers This thesis is based on the following papers, which are referred to in the text by their Roman numerals. I II III IV Jin, S., Moustaki, I., and Yang-Wallentin, F. (2015) Approximated Penalized Maximum Likelihood for Exploratory Factor Analysis. Manuscript. Jin, S., Moustaki, I., and Yang-Wallentin, F. (2015) Approximated Penalized Maximum Likelihood for Exploratory Factor Analysis: A Simulation Study of an Orthogonal Case. Manuscript. Jin, S., Yang-Wallentin, F., and Christoffersson, A. (2015) Asymptotic Efficiency of the Pseudo-Maximum Likelihood Estimator in Multi-Group Factor Models with Pooled Data. Accepted with minor revision. Jin, S., Luo, H., and Yang-Wallentin, F. (2015) A Simulation Study of Polychoric Instrumental Variable Estimation in Structural Equation Models. Under revision

Contents 1 Introduction.................................................................................................. 9 2 Background................................................................................................ 10 2.1 Exploratory factor analysis............................................................ 10 2.2 Factor rotation................................................................................ 10 2.3 Penalized least squares.................................................................. 11 2.4 Penalized maximum likelihood..................................................... 12 2.5 Confirmatory factor analysis......................................................... 13 2.6 Multi-group analysis...................................................................... 13 2.7 Violation of the normality assumption......................................... 14 2.8 Structural equation model.............................................................. 15 2.9 Ordinal data.................................................................................... 15 2.10 Polychoric instrumental variable................................................... 16 3 Research goals........................................................................................... 18 4 Summary of papers.................................................................................... 20 4.1 Paper I............................................................................................. 20 4.2 Paper II............................................................................................ 21 4.3 Paper III.......................................................................................... 22 4.4 Paper IV.......................................................................................... 22 5 Conclusion................................................................................................. 24 References........................................................................................................ 27

1. Introduction Factor analysis is a multivariate technique to decompose a covariance matrix or a correlation matrix among observed variables into equations of unobserved (latent) factors. A large number of variables, also known as indicators, are observed through interviews, questionnaires, psychological tests, etc. Indicators are assumed to be driven by a fewer number of common factors that are unobservable. The covariance structures among indicators and common factors can be explored using factor analysis, especially in social and behavioral sciences. Factor analysis can be categorized into two types according to its purpose: exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). EFA is commonly used to search for possible relations between indicators and latent factors in order to account for the correlations among indicators. Researchers usually do not have a substantive theory or any other kind of theory before EFA is conducted. The meanings of common factors are assigned after the factor model is fitted. Hence, no model is clearly defined a prior and, as its name implies, it is an exploratory technique used to search for a parsimonious representation of a set of variables. The number of latent factors is typically much smaller than the number of indicators. Therefore, EFA can also be viewed as a data reduction technique. In contrast, CFA allows researchers to test hypotheses on possible relationships between observed variables and latent factors. Researchers begin with a hypothesis stating how the indicators are related with latent factors. A CFA model is predefined based on the theory and/or hypothesis of interest. As an extension of a CFA model, a structural equation model (SEM) enables researchers to model not only the relations between indicators and latent factors but also the relations among latent factors. The SEM, which incorporates simultaneous equation models as a special case, is a multivariate regression model. However, it differs from the usual multivariate regression model in that both the response and explanatory variables in a SEM can be latent. This thesis, which consists of four papers, is devoted to estimation methods for factor analysis and SEM. New estimation methods are discussed and recently proposed methods are investigated in the thesis. 9

2. Background 2.1 Exploratory factor analysis An EFA model is of the form y = µ + Λ f + ε, (2.1) where y is a p 1 vector of indicators (also known as manifest variables), µ is a vector of intercepts, Λ is a loading matrix with the (i, j)th element λ i j, f N (0, Φ) is an m 1 common factor, ε N (0, Ψ) is the error term, and Ψ is a diagonal matrix with diagonal elements ψ i for i = 1,2,..., p. The common factor f and the error term ε are assumed to be uncorrelated. Model (2.1) implies the covariance matrix Σ(θ) = ΛΦΛ T + Ψ, where θ is the vector of unknown parameters. Several types of indeterminacy exist in the EFA. Let P be any invertible matrix and consider Λ = ΛP. The same model-implied covariance matrix is obtained if Φ = P 1 Φ(P T ) 1, since ΛΦΛ T = Λ Φ Λ T. Hence, the factor covariance matrix Φ is not uniquely identified. To estimate unknown parameters in Model (2.1), Φ is usually fixed to be an identity matrix, I. The model where Φ = I is referred to as an orthogonal model; otherwise, it is an oblique model. However, the model-implied covariance matrix is still invariant for orthogonal transformations. Letting the invertible matrix P be any orthogonal matrix, the identity ΛΛ T = Λ Λ T holds. Hence the loading matrix can be rotated orthogonally while the covariance matrix remains the same. Restrictions such as Λ T Ψ 1 Λ are diagonal and the diagonal elements are ordered in decreasing order (Jöreskog, 1967) remove rotational indeterminacy. Maximum likelihood (ML) is commonly used to fit Model (2.1), and has been discussed by several authors (e.g., Jöreskog, 1967; Lawley, 1940). Without loss of generality, we let µ = 0. Then the ML estimator minimizes F (θ) = n 2 log Σ(θ) + n 2 tr [SΣ(θ) 1], (2.2) where n is the sample size and S is the sample covariance matrix. 2.2 Factor rotation Because of rotational indeterminacy, an orthogonal EFA model is estimated first and then the estimated loading matrix is rotated to produce an interpretable loading matrix, possibly with a few large loadings and many small 10

loadings. After factor rotation, common factors can be either uncorrelated or correlated. A rotation method that results in an orthogonal structure is referred to as an orthogonal rotation. Likewise, an oblique rotation produces an oblique structure. Some commonly used orthogonal rotations are the quartimax (Neuhaus & Wrigley, 1954) and varimax rotations (Kaiser, 1958) while commonly used oblique rotations include the quartimin (Carroll, 1953) and promax rotations (Hendrickson & White, 1964). Further discussions regarding rotations can be found, for example, in Browne (2001) and Yanai & Ichikawa (2006). Factor rotations typically produce many small non-zero loadings. A sparse loading matrix with many zero elements is usually easier to interpret than a dense loading matrix. Consequently, factor loadings are often truncated at some user-specified level to produce a loading matrix with zero loadings. One rule of thumb is to treat loadings greater than 0.3 in their absolute values as sufficient loadings (Hair et al., 2010). 2.3 Penalized least squares Truncating a small loading is similar to truncating a small regression coefficient, which is, in fact, a hard-thresholding approach (Fan & Li, 2001): parameter estimates whose absolute values are less than a certain level are set to zero. Fan & Li (2001) suggested that using a hard-thresholding approach to select variables is inferior to using a soft-thresholding approach in which coefficients are continuously shrunk to zero. Soft-thresholding has been widely studied in linear regression models and generalized linear models to conduct estimation and variable selection simultaneously. One of the most famous soft-thresholding techniques is to use the least absolute shrinkage and selection operator (LASSO) proposed by Tibshirani (1996). For the linear model the LASSO estimator is argmin θ y = X θ + ε, (2.3) (y X θ) T (y X θ) + β θ, (2.4) where β > 0 is a tuning parameter and θ is the sum of all the elements in their absolute values. The LASSO conducts parameter estimation and variable selection simultaneously by shrinking coefficients towards zero and creating zero estimates. Note that ordinary least squares typically does not produce exactly zero coefficient estimates. The LASSO estimator of θ is a function of the tuning parameter β. As a simple example, the minimizer of 1 2 (y θ)2 + β θ (2.5) 11

is y β, y > β; ˆθ = y + β, y < β; 0, otherwise, (2.6) (Donoho & Johnstone, 1994). The tuning parameter β controls the magnitude of ˆθ, especially the values of y such that ˆθ is shrunk to zero. Therefore, ˆθ should be understood as a function ˆθ(β), which is referred to as the solution path of θ. In the case of multiple regressors, every ˆθ i, the ith element in ˆθ, is a function of β and the solution path of θ consists of the solution path of every θ i. The LASSO has been generalized to many other types of penalty terms: for instance, the smoothly clipped absolute deviation (SCAD) (Fan & Li, 2001), the elastic net (EN) (Zou & Hastie, 2005), and the minimax concave penalty with plus algorithm (MC+) (Zhang, 2010). See Tibshirani (2011) for a review of the LASSO and its variants. 2.4 Penalized maximum likelihood Selecting factor loadings after factor rotation is essentially a coefficient selection problem. Hence, the aforementioned techniques are naturally applicable to an EFA model. Choi et al. (2010) studied the penalized orthogonal EFA model with a LASSO penalty in the likelihood function. The penalized ML estimator with the LASSO penalty minimizes n 2 log Σ(θ) + n [ 2 tr SΣ(θ) 1] + nβ p i=1 m j=1 λ i j. (2.7) The LASSO penalty enables simultaneous estimation and shrinkage of factor loadings. Since the penalty term nβ p i=1 m j=1 λ i j is not differentiable, traditional methods are not applicable. An EM-type algorithm is proposed in Choi et al. (2010) to minimize equation (2.7). One advantage of penalized ML is that it removes rotational indeterminacy. Recall that for every orthogonal matrix P, ΛΛ T = Λ Λ T, where Λ = ΛP and the covariance matrix of f is still an identity matrix. An orthogonal rotation does not change the sum of the first two terms in equation (2.7), i.e. the unpenalized ML fit function. However, p i=1 m j=1 λ i j in the third term is likely to be different, unless the orthogonal matrix is a permutation matrix or an alternating sign matrix. Therefore, the penalized ML estimator is rotationfree. As a consequence, an orthogonal EFA model using penalized ML cannot be rotated to an oblique EFA model and vice versa. 12

More generally, the penalized ML estimator minimizes n 2 log Σ(θ) + n [ 2 tr SΣ(θ) 1] + np( Λ ; β, w), (2.8) where P is a scalar-valued penalty function, β is a vector of tuning parameters, w is a vector of weights on the factor loadings, and the absolute operator to Λ is performed element-wise. In particular, Hirose & Yamamoto (2014a,b) considered the MC+ penalty in both an orthogonal EFA and an oblique EFA. 2.5 Confirmatory factor analysis A CFA model has the same form as the EFA model (2.1) but with constraints on the parameters. Some common constraints are zero loadings, equal loadings, correlated error terms, and restricted factor covariances. These constraints represent the theories that researchers wish to test before fitting the model. If ML is used to fit a CFA model, the fit function (2.2) is minimized. The ML estimator ˆθ is asymptotically normal with asymptotic covariance matrix Ξ = 2 { T [ Σ(θ) 1 Σ(θ) 1] } 1, where = Vec[Σ(θ)]/ θ T and the operator Vec( ) stacks the columns of the enclosed matrix (Magnus & Neudecker, 1999). Ξ is in fact the inverse of the information matrix. 2.6 Multi-group analysis In the case where a common model is suitable for observations from different groups, a multi-group analysis is conducted. A single-group CFA model that has the form (2.1) is a special case of a multi-group model. A multi-group CFA model is of the form y g = µ g + Λ g f g + ε g, g = 1,2,...,G, (2.9) where the subscript g represents the gth group, and the groups are mutually independent of each other. The parameter vector θ contains all the free parameters in the mean vector µ g, the factor means E( f g ), the loading matrix Λ g, the covariance matrix of common factors Φ g, and the error variances Ψ g for all g. Model (2.9) is a general expression of a multi-group CFA. Different levels of measurement invariance can be imposed (Meredith, 1993), in which some parameters remain the same across different groups. In particular, strong measurement invariance assumes that µ g = µ and Λ g = Λ for all groups. This thesis is restricted to the case of strong measurement invariance. In the single-group analysis µ and E( f ) are not identified without the assumption that E( f ) = 0. In the multi-group analysis relative means of f g 13

are identified if E( f 1 ) = 0 (Sörbom, 1974), i.e. the factor mean in the first group is 0. Let the number of observations in the gth group be n g with its corresponding proportion p g. The multi-group ML estimator minimizes F (ML) = G g=1 p g ( log Σg (θ) + tr{s g Σ g (θ) 1 } +tr{(y g µ g )(y g µ g ) T Σ g (θ) 1 } ), (2.10) where Σ g is the model-implied covariance matrix of group g, S g is the sample covariance matrix of group g, and y g is the sample mean of group g. 2.7 Violation of the normality assumption One crucial aspect in the factor models introduced above is the multivariate normality assumption from which the likelihood function is formulated. The normality assumption is often questionable in practice. Therefore, the consequence of violating the normality assumption is particularly noteworthy. If the normality assumption fails but the fit function (2.2) is still used, it is called pseudo ML. For a general discussion, see White (1982). The pseudo ML estimator is shown to be consistent and asymptotically normally distributed under some general conditions for factor analysis (Anderson & Amemiya, 1988). In the single-group CFA the asymptotic covariance matrix is a sandwich estimator of the form Ξ ( Ω T HV HΩ ) Ξ (Yuan & Bentler, 1997), where Ω = Vech[Σ(θ)]/ θ T, the operator Vech( ) vectorizes the lower diagonal elements of the enclosed symmetric matrix, H = D T [ Σ(θ) 1 Σ(θ) 1] D/2 with D being a duplication matrix (Magnus & Neudecker, 1999) such that Vec(Q) = DVech(Q) for a symmetric matrix Q, and V is the asymptotic covariance matrix of Vech(S). When the normal assumption holds, the sandwichtype covariance matrix reduces to the covariance matrix with normal data, i.e. Ξ. A pseudo ML estimator does not necessarily jeopardize normal theory and thus the asymptotic covariance matrix Ξ may still be valid for computing standard errors. Numerous studies have been devoted to the consequences of violating the normality assumption, especially conditions under which the normal theory is valid for non-normal data. In the single-group analysis Amemiya et al. (1987) and Anderson & Amemiya (1988) have shown that, under some conditions, normal theory-based standard errors are still valid, even though the normality assumption is violated. Similar results for the multi-group analysis can be found in Satorra (2002) and Papadopoulos & Amemiya (2005). 14

2.8 Structural equation model As a generalization of the CFA model, the full SEM is η = Bη + Γξ + ζ, (2.11) x = Λ x ξ + δ, y = Λ y η + ε, (2.12) where x and y are vectors of indicators, ξ and η are vectors of latent variables, and ζ, δ, and ε are disturbances. Matrices Λ x and Λ y are factor loading matrices. The matrices B and Γ are coefficient matrices indicating relations among the latent factors. The latent vector ξ and the disturbances δ, ε, and ζ are mutually uncorrelated of each other. The SEM reduces to a factor model if B and Γ are zero. Equation (2.11) is the structural model for latent variables and Equation (2.12) is the measurement model that links the latent variables with indicators. Similar to the CFA model, ML is often used to estimate unknown parameters in a full SEM under the assumption of normally distributed indicators. If the normality assumption is violated, discussions in Section 2.7 are essentially applicable to a full SEM. 2.9 Ordinal data So far, indicators in the above models are continuous. However, ordinal data are very common in practice. For example, Likert scales are widely used in questionnaires. Conceptually, ordinal data cannot be treated as continuous data since the mean and variance are not identified. There are several ways of incorporating ordinal data into a CFA model or a SEM. In particular, the underlying distribution approach assumes that the observed variables are the counterparts of underlying continuous variables. The values of an indicator y are defined through a continuous variable y as y = i if τ i 1 < y < τ i, i = 1,2,...,c, where c is the number of categories and τ i s are thresholds. The underlying variable y is assumed to follow a standard normal distribution. Consequently, the factor model with ordinal data is expressed as y = Λ f + ε, (2.13) and the measurement model of a full SEM with ordinal data is x = Λ x ξ + δ, y = Λ y η + ε, (2.14) where x and y are underlying continuous variables. The structural model of a full SEM with ordinal data remains to be the same as Equation (2.11). 15

To fit a CFA model or a SEM with ordinal data polychoric correlations among ordinal variables are computed as in Olsson (1979) and their asymptotic covariance matrix Π are estimated as in Jöreskog (1994). Then the least squares fit function or the ML fit function (n 1)(s σ) T W (s σ), (2.15) log Σ + tr ( SΣ 1) log S p, (2.16) is minimized to produce parameter estimates, where s stacks the lower offdiagonal part of the polychoric correlation matrix and σ stacks the lower offdiagonal part of the model-implied correlation matrix. Choices of the weight matrix W have been extensively studied in the literature, such as weighted least squares (WLS) W = Π 1 (Muthén, 1978; Browne, 1984), unweighted least squares (ULS) W = I (Muthén, 1978), and diagonally weighted least squares (DWLS) W = diag(π) 1 (Muthén et al., 1997). 2.10 Polychoric instrumental variable Although the discrepancy functions are different for the methods listed above, they all share one common feature, namely, that the entire correlation matrix and the model specification are used to estimate unknown parameters. For this reason, they are hereby referred to as system-wide methods as in Bollen (1996). Bollen (1996) pointed out that system-wide methods are likely to spread biases due to model misspecification over the entire model and therefore the author proposed a limited information estimator using instrumental variables to reduce the biases. It was generalized by Bollen & Maydeu- Olivares (2007) to ordinal data and was referred to as the polychoric instrumental variable (PIV) estimator. PIV is a two-stage estimation procedure and only part of the correlation matrix is used at every step in the first stage. Hence, it decomposes the correlation matrix using a different approach to the system-wide methods. Considering the SEM (2.11) and (2.14), the factor loading of the first indicator for every latent variable is scaled to 1. Hence, the measurement model can be partitioned as 16 ( x 1 x 2 ( y 1 y 2 ) ( ) I 0 = 0 Λ x,2 ) = ( ) I 0 η + 0 Λ y,2 ( δ ξ + 1 ) ; δ 2 ( ) ε1. (2.17) ε 2

Subsequently, the full SEM becomes = B Γ Λ y,2 0 0 Λ x,2 y 1 y 2 x 2 + ( y 1 x 1 ) I B Γ 0 0 I Λ y,2 0 I 0 0 0 Λ x,2 0 I 0 ε 1 δ 1 ε 2 δ 2 ζ. (2.18) Model (2.18) is a linear regression model whose regressors are correlated with the error terms. Instrumental variables selected from the ordinal indicators are used to deal with these correlations. In the first stage the slope coefficients in Model (2.18), denoted by θ 1, are estimated using least squares. Hence, θ 1 consists of parameters in B, Γ, Λ y,2, and Λ x,2. To be more specific let u j and z j be the left-hand side observed variable and the vector of the right-hand side observed variables, respectively, in the jth row of Model (2.18). Then the parameters in the jth row can be estimated by ( P T vzp 1 vv P vz ) 1 P T vzp 1 vv P vu, (2.19) where P vz = cor(v j, z T j ), P vv = cor(v j, v T j ), P vu = cor(v j,u j ), and v j is a vector of instrumental variables for z j. In the second stage the remaining parameter vector, θ 2, is estimated by minimizing the least squares function ( ( )) T ( ( )) s σ ˆθ 1, θ 2 s σ ˆθ 1, θ 2. (2.20) Bollen & Maydeu-Olivares (2007) have shown that the PIV estimator is consistent and asymptotically normal if the instrumental variables are chosen appropriately. Because the entire model specification is not used in the first stage, PIV is expected to be more robust against model misspecification for correctly specified parameters (Bollen, 2001; Bollen & Maydeu-Olivares, 2007). Moreover, two goodness-of-fit test statistics for PIV are also proposed in Bollen & Maydeu-Olivares (2007). Nestler (2013) investigated small sample properties of PIV by comparing PIV with ULS and DWLS in the estimation of dichotomous CFA models through a Monte Carlo study. He found that PIV estimators are as accurate as ULS and DWLS estimators when the model is correctly specified and produce less bias when the model is misspecified. 17

3. Research goals The main research goal of this thesis is to study new estimation methods for factor analysis models and SEMs. To achieve the research goal the thesis is split into three major parts. Papers I and II deal with estimating and selecting non-zero factor loadings in an EFA model. As previously mentioned, truncating factor loadings is a hard-thresholding approach and penalized ML is a soft-thresholding approach. Based on our experience, the EM-type algorithm proposed by Choi et al. (2010) requires many steps until the algorithm converges. The solution path produced by penalized ML is not necessarily smooth either. For these reasons, Papers I and II are devoted to overcoming some limitations of the penalized ML estimation for EFA and inheriting the idea of using a soft-thresholding approach to select sufficient factor loadings. Paper III focuses on multi-group analysis. Under the correct model and correct distributional assumptions, multi-group ML is efficient. However, it involves a large number of parameters. For example, consider a multi-group model with G groups, m factors per group, and p indicators per group. If we assume that strong measurement invariance holds and that the loading matrix is perfect simple where every indicator is loaded on one factor, then free parameters include the deterministic means (p parameters), relative factor means (Gm m parameters), factor loadings (p m parameters), covariances of latent factors (0.5Gm 2 +0.5Gm parameters), and error variances (Gp parameters). In total, there are 0.5Gm 2 + 1.5Gm + Gp + 2p 2m unknown parameters. If the sample size is insufficiently large, inadmissible results such as non-converged estimates and non-positive definite covariance matrices are likely to happen. Under the assumption of strong measurement invariance, an alternative is to pool all the data and conduct a single-group analysis. A single-group analysis with pooled data efficiently reduces the number of parameters to be estimated. Under the same condition as the multi-group analysis mentioned earlier, the free parameters of a single-group analysis with pooled data include the mean vector (p parameters), factor loadings (p m parameters), covariances of latent factors (0.5m 2 + 0.5m parameters), and error variances (p parameters). The total number of parameters is 3p + 0.5m 2 0.5m. If the main interest is the factor loadings, a single-group analysis is expected to be numerically easier. Hence, Paper III first examines the properties of pseudo ML. Moreover, the effect of pooling has been previously studied in Luo (2011) by investigating standard errors of least squares estimators. The effect of pooling with ML is unknown to us. Therefore, a second aim of Paper III is to investigate the robustness of the normal theory for pooled data. 18

Paper IV focuses on the properties of PIV. Simulation studies in both Bollen & Maydeu-Olivares (2007) and Nestler (2013) only considered dichotomous data. However, ordinal data with more than two categories is a more common scenario. To our knowledge, no simulation studies have been done to investigate the properties of the PIV estimator in the context of ordinal data. Moreover, no studies have been devoted to PIV in a full SEM. Therefore, Paper IV conducts a simulation study to examine the properties of PIV in a CFA model and a full SEM with ordinal data. In particular, we are interested in estimation accuracy when the model is correctly and incorrectly specified. 19

4. Summary of papers 4.1 Paper I In Paper I the ML fit function (2.2) is Taylor-expanded around the ML estimator ˆθ and the function ( ) 1 ( θ 2 ˆθ ) T 2 F ˆθ ( θ θ T θ ˆθ ) + np( Λ ; β, w), (4.1) is minimized to provide an approximated solution for a penalty function P. Both the orthogonal and oblique structure fit into the framework of approximated penalized ML (4.1). It also naturally factorizes a covariance or correlation matrix. Equation (4.1) is merely a penalized weighted least squares problem and therefore provides a smooth and continuous solution path for a properly chosen penalty term. Plenty of efficient algorithms exist to optimize a penalized least squares problem. Hence, approximated penalized ML is numerically more efficient than the EM-type algorithm in Choi et al. (2010). As a tradeoff, the approximated penalized ML depends on the starting point ˆθ. Two ways of selecting the tuning parameter are discussed, namely analytical selection and solution path-based selection. The analytical selection relies on a numerical criterion. Some typical examples are AIC, BIC, mean squared error, and Kullback-Leibler information. Solution path based selection subjectively chooses a path diagram by researchers from all the path diagrams suggested by the approximated penalized ML. The approximated penalized ML is demonstrated through the Holzinger & Swineford (1939) test dataset. Various penalty terms (e.g., LASSO, SCAD, EN, MC+, and some of their variations) are applied. They naturally produce sparse loading matrices with many zero loadings. It is demonstrated through the LASSO that approximated penalized ML continuously shrinks factor loadings to zero and suggests a series of loading matrices with different number of zero loadings. On the other hand, it is seen that different combinations of a penalty term and an analytical selection method possibly lead to loading matrices with different numbers of zeros. Different starting points also have an impact on the solution path. Therefore, approximated penalized ML can also be understood as a way to select sufficient factor loadings continuously after factor rotations. Furthermore, factorizing a covariance matrix using approximated penalized ML may lead to a different loading matrix as compared with factorizing a correlation matrix. 20

4.2 Paper II In Paper II a simulation study is conducted to investigate properties of EFA using the approximated penalized ML introduced in Paper I. Attention has been restricted to an orthogonal EFA with continuous and normally distributed indicators. Three factor models are considered in the study. All the models have three factors and the number of indicators ranges from 9 to 18. Two models have a perfect simple loading matrix, whereas the other model has cross loadings. The sample sizes considered are n = 100 and 200. The LASSO (Tibshirani, 1996), naive EN (Zou & Hastie, 2005), EN (Zou & Hastie, 2005), and their adaptive versions (Zou, 2006) are considered as well as the SCAD (Fan & Li, 2001) and MC+ (Zhang, 2010). To provide guidelines for choosing the optimal selection method different analytical selection methods are compared. Hence, Paper II provides information on both the optimal analytical selection method for a given penalty and the optimal combination of the penalty and the selection method. For all penalty terms, the starting point is the varimax solution. Approximated penalized ML is compared with the varimax solution with the cut-off value 0.3 for sparsity and with the varimax rotation with and without using the cut-off value for estimation accuracy. In the case of factorizing a covariance matrix, two approaches can be applied. The first approach factorizes the covariance matrix directly, whereas the second approach first factorizes the correlation matrix and then rescales the results back to the covariance structure using estimated variances. Both approaches are considered and compared in the paper. The varimax rotation with the cut-off value of 0.3 is able to recover the correct loading matrix if it is perfect simple, but seldom recovers a loading matrix that is not perfect simple. The approximated penalized ML with analytical selection methods does not perform as well as the varimax rotation if the loading matrix is perfect simple but greatly improves the varimax rotation otherwise. Further, the perfect simple loading matrix is frequently contained in the solution path, even though an analytical selection may possibly suggest a different loading matrix. Various combinations of a penalty term and a selection method frequently produce a lower average mean squared error of factor loadings, the covariance (or correlation) matrix, and factor scores than the varimax solution without truncation. However, approximated penalized ML is generally inferior to the varimax solution with truncation if the loading matrix is perfect simple; otherwise, approximated penalized ML is generally better if the loading matrix contains small cross loadings. Given a penalty term, the optimal analytical selection criterion depends on the focus of the study and whether a covariance matrix or a correlation matrix is factorized. The optimal combination of a penalty term and an analytical selection method also depends on the main purpose of the study, the true loading structure, and the sample size. Generally speaking, the adaptive penalties and the MC+ with 21

BIC and rescaling generally work well for all purposes considered in the study if a covariance matrix is factorized. The SCAD with BIC is also promising. On the other hand, the MC+ with BIC is a good universal choice if a correlation matrix is factorized. At last, rescaling improves the percentage of correct loading structures recovered except for the EN and SCAD. The effect of rescaling depends on the penalty term, but it generally improves estimation accuracy. 4.3 Paper III In Paper III we study pseudo ML for multi-group data under the assumption of strong measurement invariance. Data are observed from but pooled and the single-group model y g = µ + Λ f g + ε g, g = 1,2,...,G, (4.2) y = µ + Λ f + ε (4.3) is fitted. It is assumed that the pooled data follow a normal distribution and the normal theory ML fit function (2.2) for one group is minimized to obtain pseudo ML estimators. Via a numerical example, it is first shown that robustness of the normal theory does not hold for pooled data. In the example a two-group model with six indicators and two factors is considered. Asymptotic relative efficiency of the pseudo ML estimator of factor loadings with respect to the multi-group ML estimator is computed. It is seen that normal theory may grossly underestimate the standard errors, which are even lower than the standard errors produced by multi-group ML. Second, the correct sandwich estimator of standard errors for the pseudo ML is derived. Using the correct formula, the multi-group ML is still asymptotically efficient with smaller standard errors than those of pseudo ML. Third, a small Monte Carlo study shows that pseudo ML produces less inadmissible solutions than multi-group ML when the sample size is small and the pseudo ML estimator of factor loadings is as accurate as the multigroup estimator. The same multi-group model as in the numerical example is used. Hence, there are 30 free parameters for the multi-group ML but only 19 free parameters for the pseudo ML. The reduction in the parameter space is profound. 4.4 Paper IV In Paper IV, properties of PIV estimators for ordinal data are investigated through a simulation study. Inadmissible rates, bias of parameter estimates, 22

bias of standard errors, and test statistics of PIV are compared with those of system-wide methods, i.e. ULS and DWLS. A CFA model and a full SEM are considered in the paper. The CFA model has 12 indicators and four factors; the SEM has three latent endogenous variables with seven indicators and two latent exogenous variables with five indicators. All the indicators are ordinal with five categories. Three levels of normality are considered and three sample sizes are simulated (i.e. n = 400, 800, and 3200). The correctly specified CFA model is fitted to investigate whether the PIV estimators are as accurate as the estimators using the system-wide methods. Four levels of misspecification are considered in which a zero loading is altered to non-zero, one at a time. Newly added non-zero loadings are fixed as zeros in the estimation. Hence, the CFA model is misspecified in the sense that non-zero loadings are omitted. Likewise, the correctly specified SEM and four levels of misspecified SEM are simulated. Misspecification is introduced to the coefficient matrices, the measurement models, or both. When the CFA model is correctly specified, PIV estimators of factor loadings and factor covariances are as accurate as the system-wide methods. However, non-normality may have negative effects. When the SEM is correctly specified, the average relative bias of PIV estimators of loadings and coefficient matrices are generally low. However, PIV tends to produce a higher average relative mean squared error. When the CFA model is misspecified, PIV tends to produce more robust estimates for factor loadings. However, PIV generally does not accurately estimate factor covariances and standard errors for incorrectly specified loadings and factor covariances. When the SEM is misspecified, PIV generally produces less biased estimates for factor loadings. However, PIV does not necessarily produce more accurate covariance estimates. In addition, invalid instrumental variables are likely to be selected and affect estimation accuracy because of model misspecification. The optimal choice of a test statistic for PIV depends on the sample size and model specification. However, the mean-and-variance adjusted statistic proposed in Bollen & Maydeu-Olivares (2007) is generally preferred in the CFA model if the sample size is n = 400 and 800. However, the mean-andvariance adjusted statistic is slightly undersized and has a low power in the SEM. 23

5. Conclusion This thesis contributes to estimating factor analysis models or SEMs in several ways. First, the idea of model penalization and coefficient shrinkage, which has been extensively studied in the statistical literature in the past two decades, has been combined with psychometrics models. Penalized ML provides an alternative to the traditional estimation-rotation procedure and to the selection of factor loadings. Second, the single-group analysis with pooled data, which is numerically simpler, provides an option to alleviate the numerical issue in a multi-group analysis. The study allows researchers to pool data and conduct a single-group analysis when the sample size is not large enough. Third, it is alerted that robustness of the normal theory does not carry over to the pooled data. Normal theory may severely underestimate standard errors of factor loadings. Fourth, properties of PIV have been investigated through a simulation study. The results shed some light on the comparison of systemwide methods and PIV. Many future research projects can be performed based on this thesis. The simulation study on the properties of approximated penalized ML is restricted to orthogonal EFA with continuous indicators. A future direction can be to study an oblique EFA model and an ordinal EFA model. Although approximated penalized ML overcomes some limitations of the penalized ML, dependence on the starting point has been noted in Paper I. As a future topic, guidelines on the optimal starting point are desired. In addition, Paper III considers the case where group memberships are known. In the case where group information is unknown, it would be worthwhile to compare pseudo ML with the mixture factor model. Studies on PIV can also be extended to other topics. For example, model misspecification, such as incorrectly scaling a cross loading to one in the first stage of PIV, is worth investigating. 24

Acknowledgements First I would like to thank my advisor Fan Yang-Wallentin gratefully. You brought me to start working on the subject of structural equation modelling. As an advisor, you have put a lot effort in supervising my research, introducing me to the big figures in the psychometric society and creating opportunities for me. As a senior, you are always available for advice. I could not ask an advisor to do more than what you have done for me. I also wish to thank another advisor of mine, Rolf Larsson. You supervised me a small project when I was a master student in the department and my licentiate thesis after I became a PhD student. You encouraged me to proceed for a PhD, shared many interesting comments with me and discussed various topics with me. These experiences have always been very helpful. I want to extend my thanks to Irini Moustaki who hosted my visit at LSE as an exchange student. It was a short visit but you have made the stay very cheerful. It is an honour to cooperate with you. Your expertise in the area and enthusiasm towards research are the keys behind the papers with you. My special thanks go to Adam Taube who was my first advisor during my stay in Sweden. We had so many interesting conversations when you supervised my master s thesis and all the time after that. You are more like a kind and knowledgeable grandpa than an advisor. I am flattered to be your second best Chinese student who likes smörgåstårta. I sincerely extend my thanks to my colleagues at the department, notably my officemates Björn Andersson and David Kreiberg. You are always in the department, day and night. Your company is a unfading color of my life as a PhD student. To Björn Andersson, you literally witnessed my entire stay at the department: not only working hard in the office but also travelling around the world in three continents. You are talented as a researcher, who push me to work harder, and trustworthy as a good friend, who keeps my secrets and helps me in many ways. To David Kreiberg, our endless academic discussions help me to enhance my statistical knowledge. Our non-academic talks and buffet nights have made PhD s life after work exciting. Many other people are part of my wonderful life in Sweden. Hao Li, Hao Luo, Jia Zhou, Daniel Ekbom, Jian Kang, Ran He, Qiao Wei, Xing He, and Zheng Ning, you are very good friends. My dormmates at 5082, Yin Zhang, Jiayin Zheng and Yongxin Xu, you still got my back even though you are not in Sweden. Yi Xu, we have not seen each other for a long time. But a what s up everyday makes me feel like we were still young and still studying together somewhere in ZJG. Xuan Li, thanks for your company during my last journey of the program. 25

Finally, I would like to thank my parents for years of support. You encouraged me to pursuit a PhD when I was hesitating. Whenever I felt sad and lonely, I can always find my strength back from you. Without your love, I cannot make this thesis. 26

References Amemiya, Y., Fuller, W. A., & Pantula, S. G. (1987). The asymptotic distributions of some estimators for a factor analysis model. Journal of Multivariate Analysis, 22, 51-64. Anderson, T. W., & Amemiya, Y. (1988). The asymptotic normal distribution of estimators in factor analysis under general conditions. The Annals of Statistics, 16, 759-771. Bollen, K. A. (1996). An alternative two stage least squares (2SLS) estimator for latent variable equations. Psychometrika, 61, 109-121. Bollen, K. A. (2001). Two-stage least squares and latent variable models: Simultaneous estimation and robustness to misspecifications. In R. Cudeck, S. D. Toit,, & D. Sörbom (Eds.), Structural equation modeling: Present and future (p. 119-138). Lincolnwood, IL: Scientific Software. Bollen, K. A., & Maydeu-Olivares, A. (2007). A polychoric instrumental variable (PIV) estimator for structural equation models with categorical variables. Psychometrika, 72, 309-326. Browne, M. W. (1984). Asymptotically distribution-free methods in the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 62-83. Browne, M. W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 36, 111-150. Carroll, J. B. (1953). An analytic rotation for approximating simple structure in factor analysis. Psychometrika, 18, 23-38. Choi, J., Zou, H., & Oehlert, G. (2010). A penalized maximum likelihood approach to sparse factor analysis. Statistics and Its Interface, 3, 429-436. Donoho, D. L., & Johnstone, J. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81, 425-455. Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360. Hair, J., Black, W., Babin, B., & Anderson, R. (2010). Multivariate data analysis (7th ed.). Upper Saddle River, NJ: Prentice Hall. Hendrickson, A. E., & White, P. O. (1964). Promax: A quick method for rotation to oblique simple structure. British Journal of Mathematical and Statistical Psychology, 17, 65 70. Hirose, K., & Yamamoto, M. (2014a). Estimation of an oblique structure via penalized likelihood factor analysis. Computational Statistics & Data Analysis, 79, 120-132. Hirose, K., & Yamamoto, M. (2014b). Sparse estimation via non-covave penalized likelihood in factor analysis model. Statistics and Computing, Advance online publication. 27

Holzinger, K., & Swineford, F. (1939). A study in factor analysis: The stability of a bifactor solution. Supplementary Educational Monograph, No. 48, Chicago, IL: University of Chicago Press. Jöreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis. Psychometrika, 32, 443 482. Jöreskog, K. G. (1994). On the estimation of polychoric correlations and their asymptotic covariance matrix. Psychometrika, 59, 381-389. Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187-240. Lawley, D. N. (1940). The estimation of factor loadings by the method of maximum likelihood. Proceedings of the Royal Society of Edinburgh, 60, 64-82. Luo, H. (2011). The effect of pooling multi-group data on the estimation of factor loadings. Unpublished manuscript, Department of Statistics, Uppsala University, Uppsala, Sweden. Magnus, J. R., & Neudecker, H. (1999). Matrix differential calculus with applications in statistics and econometrics. Hoboken, NJ: Wiley. Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525-543. Muthén, B. (1978). Contributions to factor analysis of dichotomous variables. Psychometrika, 43, 551-560. Muthén, B., du Toit, S. H. C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished manuscript. Nestler, S. (2013). A Monte Carlo study comparing piv, uls and dwls in the estimation of dichotomous confirmatory factor analysis. British Journal of Mathematical and Statistical Psychology, 66, 127-143. Neuhaus, J. O., & Wrigley, C. (1954). The quartimax method: An analytical approach to orthogonal simple structure. British Journal of Mathematical and Statistical Psychology, 7, 81-91. Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44, 443-460. Papadopoulos, S., & Amemiya, Y. (2005). Correlated samples with fixed and nonnormal latent variables. The Annals of Statistics, 33, 2732-2757. Satorra, A. (2002). Asymptotic robustness in multiple group linear-latent variable models. Econometric Theory, 18, 297-312. Sörbom, D. (1974). A general method for studying differences in factor means and factor structure between groups. British Journal of Mathematical and Statistical Psychology, 27, 229-239. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 58, 267-288. Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73, 273-282. White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50, 1-25. Yanai, H., & Ichikawa, M. (2006). Factor analysis. In C. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26, p. 257-296). North Holland: Elsevier. 28

Yuan, K.-H., & Bentler, P. M. (1997). Improving parameter tests in covariance structure analysis. Computational Statistics & Data Analysis, 26, 177-198. Zhang, C. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894-942. Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418-1429. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 301-302. 29