Misspecification in Nonrecursive SEMs 1 Nonrecursive Latent Variable Models under Misspecification
Misspecification in Nonrecursive SEMs 2 Abstract A problem central to structural equation modeling is measurement model specification error and its propagation within nonrecursive latent variable models. Full information estimation techniques such as maximum likelihood are consistent when the model is correctly specified and the sample size large enough; however, any misspecification within the model can affect parameter estimates in other parts of the model. The goal of this study was to compare the accuracy and efficiency of (a) Jöreskog and Sörbom s (2007) TSLS estimator (JS-TSLS), (b) Bollen s (1996a; 1996b; 2001) TSLS estimator (KB-2SLS), (c) Bayesian, (d) Maximum Likelihood (ML), and the Latent Variable Score (LVS; Croon, 2002; Jöreskog and Sörbom,1999) approaches in nonrecursive latent variable models in small to moderate sample size conditions.
Misspecification in Nonrecursive SEMs 3 Introduction and Goals The use of nonrecursive models in applied research is increasing (Kaplan, 2009; Paxton, 2011). Nonrecursive models are simultaneous equations models or SEMs with bidirectional feedback loops (Kaplan, 2009). Nonrecursive SEMs present an additional layer of complexity for the estimation of model parameters. For example, challenges in establishing the rank and order conditions and stability in the dynamic system exist. Stability in nonrecursive models implies an underlying dynamic specification to the structural model (e.g., some time period must occur for the feedback to occur). The goals of this study were threefold. The first goal was to examine the degree of accuracy (expressed as percentage bias identified) in parameter estimates in the measurement and structural portions of a nonrecursive latent variable model for the Jöreskog and Sörbom s TSLS estimator (JS-TSLS), (b) Bollen s (1996a; 1996b; 2001) TSLS estimator (KB-2SLS), (c) Bayesian, (d) Maximum Likelihood (ML), and the Latent Variable Score (LVS) approach in non-recursive latent variable model. The second goal was to examine the relative bias produced by three levels of model specification (correct, moderately misspecified and severely misspecified) for the various estimators across very small to moderate sample sizes. The final research goal examines how and to what degree misspecification severity affects the propagation of bias from the measurement model to the structural model. Sample Size The SEM literature on the impact of using maximum likelihood estimation regarding the percentage of proper solutions, accuracy of parameter estimates and the
Misspecification in Nonrecursive SEMs 4 appropriateness of the overall chi-square reveals that large sample sizes (N) are required for unbiased parameter estimates. For example, Anderson and Gerbing (1988) recommend an N of between 100 to 150 subjects, Boomsma (1983) recommends N=400. Bentler and Chou (1987) recommend a ratio as low as five subjects per variable as being sufficient in normal and elliptical distributions when the latent variables have multiple indicators. MacCallum, Browne and Sugawara (1996) provide support for large N noting that the power and precision of parameter estimates increase monotonically with sample size and the degrees of freedom. Ding, Velicer and Harlow (1995) reported that the likelihood of fully proper solutions increased with increasing the number of indicators per factor, sample size and magnitude of factor loadings. Further complicating matters, is the issue of model specification (or misspecification) and how it propagates through the system of equations Paxton (2011). Model The modified LISREL SEM notation presented in Bollen (2001) was followed in this investigation. Specifically, the nonrecursive SEM is represented using the following general notation: η B Γξ ζ, y η, y x ξ, x y x (1) Where, η is a vector of latent endogenous variables, ξ is the vector of latent exogenous variables, and ζ is a vector of disturbances. The matrix B provides the effects of latent endogenous variables on one another and the Γ matrix provides the
Misspecification in Nonrecursive SEMs 5 effects of the latent exogenous variables on the latent endogenous variables. The term denotes the respective equation intercepts. The y and x vectors represent observed variables that are affected by η and ξ with coefficients y and ' equation 1, E (ζ) 0 and the COV (ξ,ζ ) 0. In the measurement model x respectively. Also, in E( ) 0, E( ) 0 and these unique factors are uncorrelated with ξ, ζ, and ξ and ζ, are uncorrelated with each other. Model Estimators Two-Stage Least Squares (TSLS) Estimators Two-Stage Least Squares (TSLS) estimators enjoy a long and well-established history in econometrics. In this investigation, we applied (1) Bollen s (1996; 2001:122-4) (labeled here as KB-TSLS) estimator as presented in equation 2 and Jöreskog and Sörbom s (1999; 2007) TSLS estimator (labeled as JS-TSLS). In KB-TSLS (Equation 2), the latent variable model is rearranged into an observed variable model and estimation proceeds in a single step. In the JS-TSLS, the measurement model is estimated first then the structural relations among the latent variables are estimated in a second step (Jöreskog and Sörbom, 1999, p. 168). y α By Γx B Γ ζ, y x 1 1 1 1 1 1 α y, 2 y2 y2 1 y2 1 2 α x, 2 x2 x2 1 x2 1 2 (2)
Misspecification in Nonrecursive SEMs 6 Where, y 2 and x 2 are vectors of nonscaling indicators. Equation 2 is simplified in equation 3 by using for example the j th equation from y 1 as y α B y Γ x (3) j j j 1 j 1 j Where y j is the j th y from y 1, j is the intercept, B j is the j th row from B,Γ j is the row from Γ, and u j is the j th element from u where u B Γ ζ. 1 1 1 Two-stage least squares estimators are also known as instrumental variable (IV) estimators due to the required role that IVs play in the estimation process. Specifically, IV estimation was developed to resolve either model identification or estimation problems stemming from regressor variables being correlated with the error terms (e.g., in nonrecursive models). For an observed variable to satisfy the requirements for serving as an IV, the IV (e.g., noted as variable z 1 ), the variable must (a) be uncorrelated with disturbance ζ 1, that is, COV ( z 1,ξ) 1 0, but is correlated with the variable for which it is an instrument, COV ( z1, x1) 0. Bayesian Estimation For an excellent introduction of the statistical mechanics of Bayesian parameter estimation and model specification see Hoff (2009). Using conjugate (informative) priors for parameters within the Bayesian framework is particularity important in scenarios where sample is small to moderate (Gelman, 2004). In the present study, using conjugate priors is particularly important since the sensitivity of the analytic approach is critical to the detection of parameter bias. Bayesian estimation was based on known distributional properties of the data to inform conjugate priors for model (θ) parameters
Misspecification in Nonrecursive SEMs 7 and precision covariance Σ ~ Inverse Wishart (~IW 0, 4). The values of 0 and 4 were selected based on the distributional properties of the multivariate distribution used in the population model. The conditional distribution of one set of parameters given other sets was used in the approximation of the joint distribution of all parameters (Muthén and Asparouhov, 2012). The posterior predictive p-value (PPP) was observed as.65 indicating no specification problems. The Gelman-Rubin convergence diagnostic considers the Potential Scale Reduction statistic (PSR; Gelman, Carlin, Stern and Rubin, 2004) and monitors the between chain variation to the within-chain variation. Proper convergence of the MCMC chains over the 50,000 draws was achieved (i.e. observed as 1.02), ideal values are between 1 and 1.1. Thus convergence was verified by PSR value of < 1.1. Maximum Likelihood Maximum likelihood estimation of model parameters proceeded as implemented in the Mplus, version 7.0 program (Muthén & Muthén, 2012). Interested readers may refer to any number of excellent publications on the statistical details of maximum likelihood estimation (e.g., Bollen, 1989; Kaplan, 2000, 2010). Latent Variable Scores Latent variable scores (LVS) are predicted scores based on a weighted function of observed variables (Croon, 2002, p. 201; Jöreskog & Sörbom, 1999; Bollen, 1989). LVS scores are scores that have been cleansed of measurement error yet retain the error variance in the latent variable. One can create LVS based on the measurement
Misspecification in Nonrecursive SEMs 8 model(s) for the latent variables then use the LVS in a nonrecursive system (e.g., an approach used in classic econometrics). However, adequacy of the measurement model is crucial prior to developing LVS. Methods Population model parameter estimates were based on the standardization sample for the Wechsler Preschool and Primary Test of Intelligence-Fourth Edition (Wechsler, 2012). The population model (Figure 1) exhibited perfect fit to the data (χ 2 = 4.2(8), p=.83; CFI = 1.0; RMSEA=0.0; SRMR = 0.0). The nonrecursive system exhibited freedom from linear dependency (i.e. the system was stable) as evidenced by a stability index of.20 (Bentler & Freeman, 1983). The model is yet one common in social and behavioral research incorporating components in many applications of SEMs.
Misspecification in Nonrecursive SEMs 9 Figure 1 Population model Disturbance ζ1 λy 11 Information (Y1 ) error (ε1) VC (η 1 ) λy 21 λy 31 Receptive Vocabulary (Y2 ) error (ε2) Disturbance ζ2 VS (η 2 ) λy 32 λy 42 Block Design (Y3 ) Object Assembly (Y4 ) error (ε3) error (ε4) WM (ξ 1 ) λx 21 λx 11 λx 21 Picture Memory (X1 ) Zoo Locations (X2 ) error (δ1) error (δ2) Correct specification (SP1) = solid lines only; Moderate misspecification (SP2) = solid paths + dashed path (- - -); Severe misspecification (SP3) = solid paths + dashed path (- - -) + dashed-dotted path (- -). VC=verbal comprehension, VS=visual spatial, WM=working memory. Specification Conditions Model specification conditions included: (SP1) the correctly specified model included all solid paths. Specification 2 (SP2) included the path from latent variable (VC) to block design (Y3). Specification 3 (SP3) included the paths of latent variable (VC) to block design (Y3) and 3 (WM) to receptive vocabulary (Y2). Model Parameterization and Sample Size For model parameterization, the guidelines provided by Paxton, et al., (2001) were followed and included (a) including R 2 values (loadings of.25 -.57) and bias levels
Misspecification in Nonrecursive SEMs 10 that are typically seen (> 10%) in applied research and thus meaningful via interpretation, (b) statistical significance plausible at the smallest sample size, (c) statistical power ranging from moderate to high (.80 -.95). Five sample sizes were included: 60, 100, 200, 300 and 500. Data Generation and Parameter Estimation The internal Monte Carlo facility within the Mplus version 7.0 computer program (Muthén & Muthén, 2010) was used to derive ML, Bayesian, KB-TSLS and LVS estimates. The JS-TSLS parameter estimates were derived using LISREL 8.8 (Jöreskog & Sörbom, 2007). In each estimation method, parameter estimates derived from the population model served as starting values. Monte Carlo Study Data generation data for the variables in the model proceeded by using a multivariate normal distribution for each of the following conditions (a) specification 1 = no misspecification, N = 60, 100, 200, 300, 500; (b) specification 2 = moderate misspecification, N = 60, 100, 200, 300, 500; (c) specification 3 = severe misspecification, N = 60, 100, 200, 300, 500. This 3 (misspecification) X 5 (sample size) X 5 (estimation method) X 700 replications per condition design resulted in 52,500 data sets. The internal Monte Carlo facility within the Mplus version 7.0 computer program (Muthén & Muthén, 2010) was used to derive ML, Bayesian, KB-TSLS and LVS estimates. The JS-TSLS parameter estimates were derived using LISREL 8.8 (Jöreskog & Sörbom, 2007). In each estimation method, parameter estimates derived from the population baseline model served as starting values.
Misspecification in Nonrecursive SEMs 11 Results Tables 1-3 provide a summary of the percent bias for all estimators across sample size conditions. Insert Tables 1 3 Here Inspection of Tables 1 through 3 reveals that the LVS approach yielded the lowest amount of structural model parameter bias (22%). Closer inspection reveals that for the SP1 (no misspecification) and SP3 (severe misspecification) conditions, the LVS approach yielded the lowest rate of structural model bias. For SP2, the ML approach yielded the lowest rate of bias (26%) across sample size conditions. Under moderate and severe misspecification, the ML, Bayesian and JS-TSLS approaches produced biased parameter estimates (i.e. between 33 63%) in the measurement portion of the models. Scholarly Significance Results of methodological-oriented studies such as ours are of interest to theoretical and applied researchers using structural equation modeling in order to better inform their modeling strategies. The first goal was to examine the degree of accuracy
Misspecification in Nonrecursive SEMs 12 (expressed as percentage bias identified) in parameter estimates in the measurement and structural portions of a population model for the JS-TSLS), KB-TSLS, Bayesian and Maximum Likelihood approaches. The second goal was to examine the relative bias produced by three levels of model misspecification for the JS-TSLS, KB-TSLS, Bayesian, ML and LVS estimators across very small to moderate sample sizes. The final research goal examined misspecification severity affected the propagation of bias from the measurement model to the structural model. The results demonstrate that when the measurement model is correctly specified and fits well, the LVS approach provides superior results. Additionally, in the ML, Bayesian, JS-TSLS and KB-TSLS, bias originating in the measurement portions of the model appeared to propagate into the structural model at a level that is of concern. The results provided here will aid researchers with information regarding the use of the most appropriate estimation method for nonrecursive models and for avoiding biased results obtained from incorrectly specified models in light of sample size conditions.
Misspecification in Nonrecursive SEMs 13 Table 1 No Misspecification Condition No Misspecification N=60 N=100 N=200 Parameter ML Bayes KB- TSLS JS - TSLS LVS ML Bayes KB- TSLS JS - TSLS LVS ML Bayes KB- TSLS JS -TSLS LVS INSS VC RVSS VC 29% 16% BDSS VS 25% OASS VS 37% 13% PMSS WM 46% 11% 17% 17% ZLSS WM 12% 32% 16% 37% VS VC 16% 22% 46% 19% > 50% > 50% 28% > 50% > 50% 36% 16% VC VS 21% 17% 27% 21% 38% VS WM 17% 36% > 50% 27% 40% 33% 14% > 50% 38% 39% 48% N=300 N=500 KB- JS - KB- TSLS TSLS LVS ML Bayes TSLS ML Bayes LVS INSS VC RVSS VC BDSS VS OASS VS PMSS WM ZLSS WM 15% VS VC > 50% > 50% > 50% > 50% VC VS 13% 18% 15% VS WM 17% 43% 16% 21% 25% 36% 18% Note. Only bias of 10% or larger is displayed. Empty cells denotes bias < 10%. LVS yielded zero bias in structural model* JS - TSLS
Misspecification in Nonrecursive SEMs 14 Table 2 Moderate Specification Condition Moderate Misspecification N=60 N=100 N=200 KB- Parameter ML Bayes KB-TSLS JS -TSLS LVS ML Bayes KB-TSLS JS -TSLS LVS ML Bayes TSLS JS -TSLS LVS INSS VC 17% BDSS VC* 29% 18% >50% >50% >50% 23% BDSS VS 15% >50% OASS VS 36% 12% 50% PMSS WM >50% 12% 26% ZLSS WM 12% 17% 30% 15% 28% 12% RVSS VC 39% 37% 15% 11% RVSS WM 22% > 50% >50% 26% VS VC >50% >50% 12% 37% 20% 46% 13% 24% 11% VC VS 17% >50% 13% 11% 12% 25% 23% 13% > 44% >50% 39% 13% VS WM >50% 15% 16% 32% >50% 14% 47% N=300 N=500 ML Bayes KB-TSLS JS -TSLS LVS ML Bayes KB-TSLS JS -TSLS LVS INSS VC BDSS VC* 18% 24% BDSS VS 44% >50% OASS VS 18% PMSS WM 11% ZLSS WM 12% 13% RVSS VC 16% RVSS WM >50% >50% >50% > 50% >50% VS VC 23% 40% 12% VC VS 35% 30% 25% 13% 25% 39% VS WM 15% 30% 18% 13% Note. Only bias of 10% or larger is displayed. Empty cells denotes bias < 10%. ML method yielded the lowest bias in structural model (26%).
Misspecification in Nonrecursive SEMs 15 Table 3 Severe Misspecification Condition Parameter ML Bayes KB- TSLS Severe Misspecification N=60 N=100 N=200 JS - KB- TSLS LVS ML Bayes TSLS JS - TSLS LVS ML Bayes KB-TSLS JS -TSLS LVS INSS VC BDSS VC >50% >50% >50% >50% >50% >50% BDSS VS 13% 15% 18% OASS VS 15% PMSS WM >50% 74% 26% ZLSS WM 20% 17% 15% 28% RVSS VC >50% 15% 11% RVSS WM >50% >50% 10% VS VC 20% >50% >50% 46% 29% 46% 29% 11% VC VS 20% 23% 24% 23% VS WM >50% 15% 26% 15% 34% 23% 47% N=300 N=500 ML Bayes KB-TSLS JS -TSLS LVS ML Bayes KB-TSLS JS -TSLS LVS INSS VC BDSS VC* >50% >50% 28% 44% BDSS VS OASS VS PMSS WM 52% 11% ZLSS WM 15% 15% 13% RVSS VC RVSS WM* VS VC 36% 15% VC VS 37% VS WM 11% 18% Note. Only bias of 10% or larger is displayed. Empty cells denotes bias < 10%. Latent variable score method yielded zero bias in structural model.
Misspecification in Nonrecursive SEMs 16 References Anderson, J. C., and Gerbing, D. W. (1988). Structural equation modeling in practice: A review and recommended two step approach. Psychological Bulletin, 103, 411-423. Bentler, P. M. and Chou, C. (1987). Practical issues in structural equation modeling. Sociological Methods and Research, 16, 78 117. Bentler, P. M. & Freeman, (1983). Tests for stability in linear structural equation systems. Psychometrika, 48: 143-145. Bollen, K. A. (1996a). A limited information estimator for LISREL models with and without heteroscedastic errors. Advanced Structural Equation Modeling Techniques, edited by G. Marcoulides and R. Schuamacher. Mahwah, NJ: Lawrence Erlbaum. Bollen, K. A. (1996b). An alternative Two-Stage Least Squares estimator for latent variable equations. Psychometrika, 61: 109-21. Bollen, K. A. (2001). Two- Stage Least Squares and latent variable models: simultaneous estimation and robustness to misspecifications. Structural Equation Modeling Present and Future, edited by R. Cudeck, S. DuToit, D. Sörbom. Chicago, IL: Scientific Software. Bollen, K. A., Kirby, J.A., Curran, P. J., & Paxton, P, M. (2007). Latent variable models under misspecification: Two-Stage Least Squares (TSLS) and Maximum Likelihood (ML) estimators. Sociological Methods & Research, 36:1, 48-86. Bollen, K. A. (1989). Structural Equations with latent variables. New York, NY: John Wiley & Sons. Boomsma, A. (1983). On the robustness of LISREL against small sample size and nonnormality. Amsterdam: Sociometric Research Foundation. Croon, M. (2002). Using predicted latent scores in general latent structure models. Chapter 10 in Latent variable and latent structure models, edited by G. Marcoulides and I. Moustaki. Mahwah, NJ: Lawrence Erlbaum Associates Publishers. Ding, C., Velicer, W. F., & Harlow, L. L. (1995). The effects of estimation methods, number of indicators per factor and improper solutions on structural equation model fit indices. Structural Equation Modeling, 2, 119-144.
Misspecification in Nonrecursive SEMs 17 Hoff, P. D. (2009). A First Course in Bayesian Statistical Methods. Springer Texts in Statistics. New York: NY: Springer. Jöreskog, K. J., Sörbom, D., du Toit, S., & du Toit, M. (1999). LISREL 8: New Statistical features. Chicago, IL: Scientific Software Jöreskog, K. J. & Sörbom, D. (2007). LISREL version 8.8 (Computer program]. Chicago, IL: Scientific Software. Kaplan, D. (2001). Structural Equation Modeling: Foundations and Extensions. Thousand Oaks: Sage Publications. Kaplan, D. (2010). Structural Equation Modeling: Foundations and Extensions, 2 nd ed. Thousand Oaks: Sage Publications. MacCallum, R. C., Browne, M. W., and Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130-149. Muthén L.K., Muthén, B.O. (2012). Mplus Version 7.0 [Computer program]. Los Angeles, CA: Muthén & Muthén. Paxton, P. M., Curran, P. J., Bollen, K. A., & Kirby, J. (2001). Monte Carlo experiments: Design and implementation. Structural Equation Modeling, 8 (2), 287-312. Paxton, P.M., Hipp, J. R., & Marquett-Pyratt (2011). Nonrecursive models: Endogeneity, reciprocal relationships, and feedback loops. Thousand Oaks: CA: Sage Publications. Wechsler, D. (2012). Wechsler Preschool and Primary Test of Intelligence Fourth Edition. San Antonio, TX: NCS Pearson.