Asymptotic robustness of standard errors in multilevel structural equation models

Similar documents
Testing Structural Equation Models: The Effect of Kurtosis

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Nesting and Equivalence Testing

Hypothesis Testing for Var-Cov Components

ANALYSIS OF TWO-LEVEL STRUCTURAL EQUATION MODELS VIA EM TYPE ALGORITHMS

Strati cation in Multivariate Modeling

Model fit evaluation in multilevel structural equation models

Scaled and adjusted restricted tests in. multi-sample analysis of moment structures. Albert Satorra. Universitat Pompeu Fabra.

High-dimensional asymptotic expansions for the distributions of canonical correlations

Fisher information for generalised linear mixed models

Edgeworth Expansions of Functions of the Sample Covariance Matrix with an Unknown Population

Misspecification in Nonrecursive SEMs 1. Nonrecursive Latent Variable Models under Misspecification

LINEAR MULTILEVEL MODELS. Data are often hierarchical. By this we mean that data contain information

STRUCTURAL EQUATION MODELS WITH LATENT VARIABLES

Title. Description. Remarks and examples. stata.com. stata.com. Variable notation. methods and formulas for sem Methods and formulas for sem

MODEL IMPLIED INSTRUMENTAL VARIABLE ESTIMATION FOR MULTILEVEL CONFIRMATORY FACTOR ANALYSIS. Michael L. Giordano

UCLA Department of Statistics Papers

ON NORMAL THEORY AND ASSOCIATED TEST STATISTICS IN COVARIANCE STRUCTURE ANALYSIS UNDER TWO CLASSES OF NONNORMAL DISTRIBUTIONS

Determining Sample Sizes for Surveys with Data Analyzed by Hierarchical Linear Models

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

RANDOM INTERCEPT ITEM FACTOR ANALYSIS. IE Working Paper MK8-102-I 02 / 04 / Alberto Maydeu Olivares

Multilevel Analysis of Grouped and Longitudinal Data

Hierarchical Linear Models. Jeff Gill. University of Florida

Multilevel Analysis, with Extensions

Asymptotic inference for a nonstationary double ar(1) model

ROBUSTNESS OF MULTILEVEL PARAMETER ESTIMATES AGAINST SMALL SAMPLE SIZES

Testing structural equation models: the effect of kurtosis. Tron Foss BI Norwegian Business School. Karl G. Jøreskog BI Norwegian Business School

Forecasting 1 to h steps ahead using partial least squares

The properties of L p -GMM estimators

PIRLS 2016 Achievement Scaling Methodology 1

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS

Inference with Heywood cases

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Model Estimation Example

An Introduction to Multivariate Statistical Analysis

A Threshold-Free Approach to the Study of the Structure of Binary Data

Multilevel Modeling: When and Why 1. 1 Why multilevel data need multilevel models

CENTERING IN MULTILEVEL MODELS. Consider the situation in which we have m groups of individuals, where

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Multilevel Regression Mixture Analysis

An Alternative to Cronbach s Alpha: A L-Moment Based Measure of Internal-consistency Reliability

Partitioning variation in multilevel models.

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

Minimax design criterion for fractional factorial designs

Multilevel regression mixture analysis

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

A note on profile likelihood for exponential tilt mixture models

A Goodness-of-fit Test for Copulas

Local Influence and Residual Analysis in Heteroscedastic Symmetrical Linear Models

Journal of Multivariate Analysis. Use of prior information in the consistent estimation of regression coefficients in measurement error models

KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS

CONSTRUCTION OF COVARIANCE MATRICES WITH A SPECIFIED DISCREPANCY FUNCTION MINIMIZER, WITH APPLICATION TO FACTOR ANALYSIS

Manabu Sato* and Masaaki Ito**

FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION

Estimation and Testing for Common Cycles

Chapter 4: Factor Analysis

Standard errors in covariance structure models: Asymptotics versus bootstrap

Estimation: Problems & Solutions

A Monte Carlo Power Analysis of Traditional Repeated Measures and Hierarchical Multivariate Linear Models in Longitudinal Data Analysis

Statistical Inference On the High-dimensional Gaussian Covarianc

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Online Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω)

Improper Solutions in Exploratory Factor Analysis: Causes and Treatments

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

An Extended BIC for Model Selection

Correspondence Analysis of Longitudinal Data

Measuring the Sensitivity of Parameter Estimates to Estimation Moments

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

Econometric Analysis of Cross Section and Panel Data

Assessing the relation between language comprehension and performance in general chemistry. Appendices

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models

(Received April 2008; accepted June 2009) COMMENT. Jinzhu Jia, Yuval Benjamini, Chinghway Lim, Garvesh Raskutti and Bin Yu.

For more information about how to cite these materials visit

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Evaluating Small Sample Approaches for Model Test Statistics in Structural Equation Modeling

Haruhiko Ogasawara. This article gives the first half of an expository supplement to Ogasawara (2015).

Utilizing Hierarchical Linear Modeling in Evaluation: Concepts and Applications

Measuring the Sensitivity of Parameter Estimates to Estimation Moments

Parametric Techniques Lecture 3

The outline for Unit 3

THE 'IMPROVED' BROWN AND FORSYTHE TEST FOR MEAN EQUALITY: SOME THINGS CAN'T BE FIXED

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi

Discrete Dependent Variable Models

An almost sure invariance principle for additive functionals of Markov chains

On asymptotic properties of Quasi-ML, GMM and. EL estimators of covariance structure models

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Parametric Techniques

Testing Restrictions and Comparing Models

New insights into best linear unbiased estimation and the optimality of least-squares

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Chapter 3. Point Estimation. 3.1 Introduction

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

Transcription:

ournal of Multivariate Analysis 97 (2006) 1121 1141 www.elsevier.com/locate/mva Asymptotic robustness of standard errors in multilevel structural equation models Ke-Hai Yuan a,, Peter M. Bentler b a University of Notre Dame, IN, USA b University of California, Los Angeles, CA, USA Received 20 May 2004 Available online 28 uly 2005 Abstract Data in social and behavioral sciences are often hierarchically organized. Multilevel statistical methodology was developed to analyze such data. Most of the procedures for analyzing multilevel data are derived from maximum likelihood based on the normal distribution assumption. Standard errors for parameter estimates in these procedures are obtained from the corresponding information matrix. Because practical data typically contain heterogeneous marginal skewnesses and kurtoses, this paper studies how nonnormally distributed data affect the standard errors of parameter estimates in a two-level structural equation model. Specifically, we study how skewness and kurtosis in one level affect standard errors of parameter estimates within its level and outside its level. We also show that, parallel to asymptotic robustness theory in conventional factor analysis, conditions exist for asymptotic robustness of standard errors in a multilevel factor analysis model. 2005 Elsevier Inc. All rights reserved. AMS 1991 subect classification: primary 62H05; secondary 62H25; 62P25 Keywords: Nonnormal data; Skewness; Kurtosis; Standard errors; Asymptotic robustness This research was supported by NSF Grant DMS04-37167 and Grants DA01070 and DA00017 from the National Institute on Drug Abuse. Corresponding author. Fax: +1 574 631 8700. E-mail address: kyuan@nd.edu (K.-H. Yuan). 0047-259X/$ - see front matter 2005 Elsevier Inc. All rights reserved. doi:10.1016/.mva.2005.06.003

1122 K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 1. Introduction Data in social and behavioral sciences often exhibit a hierarchical structure. For example, households are nested within neighborhoods, neighborhoods are nested within cities, and cities are further nested within countries; students are nested within classes, classes are nested within schools, and schools are further nested within school districts. Because observations within a cluster are generally correlated, statistical methods for such data have been developed to explicitly account for these correlations in order to get accurate results. When predictors are error free, using the hierarchical linear model (HLM) leads to correct analysis of the hierarchical data [8,10,11,13,17,27,31]. When predictors contain measurement errors, HLM is not appropriate because it does not account for these errors. The multilevel structural equation model (SEM) has to be used to obtain consistent parameter estimates [2,6,14 16,20,23,24]. Actually, when errors in predictors are modeled explicitly, a HLM model automatically becomes a multilevel SEM model [9,18,21,25]. All the above literature deals with model inference through methods that require the multivariate normality assumption for hierarchical data. Real data typically have larger skewness and kurtosis than those of a normal distribution [29]. For example, Micceri [22] reported that among 440 large sample achievement and psychometric measures taken from ournal articles, research proects, and tests, all were significantly nonnormally distributed. In reality, the normality assumption used in modeling should be considered as only a working assumption. When extra kurtoses exist in the data, the actual standard errors (SEs) of parameter estimates will be larger than those based on the normality assumption; the likelihood ratio statistics for testing variance components or the overall model structure also tend to be more significant than they actually are. Consequently, inference based on the normality assumption may be no longer valid when modeling practical data. A few studies on the multilevel model with violations of the normality assumption also exist [5,26,34,35]. Within the context of HLM, Cheong et al. [5] studied the effect of nonnormal data on SEs of regression parameter estimates using simulation. They found that the SEs based on a sandwich-type covariance matrix are quite robust even when the model is misspecified. For two-level HLM and SEM, [34] discussed the strength of several possible sandwich-type covariance matrices. Neither of these papers analytically studied the effect of skewness and kurtosis on SEs. Yuan and Bentler [34 37] studied the behavior of the normal theory based likelihood ratio statistic with distribution violations; they also proposed several alternative statistics for overall model evaluation. However, the effect of nonnormal data on SEs of the parameter estimates in the context of multilevel models has not been well studied. For example, there are level-1 and level-2 random components in a two-level SEM model. It is not clear how the kurtosis in level-1 affects the SEs of level-1 and level-2 parameter estimates. How the skewnesses of level-1 and level-2 components affect the SEs of parameter estimates at different levels is not clear either. When data are elliptically distributed (see [7,12]), SEs of factor loading estimates within the conventional SEM context can be obtained by adusting the corresponding normal distribution based SEs [4,28,30] using the relative kurtosis. It is of interest to see whether this result can be extended to a multilevel confirmatory factor model. In the context of conventional factor analysis, there also exist results of asymptotic robustness on SEs. For

K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 1123 example, Anderson andamemiya [1] found that the SEs of factor loading estimates using the normal distribution assumption can be asymptotically valid for nonnormal data. A further result in this direction was obtained in [32] by considering skewed data whose fourth-order moments are identical to those of an elliptical distribution. It is of interest to see whether parallel results hold in a multilevel factor analysis model. Our purpose is to systematically study the effect of nonnormal data on SEs of parameter estimates in multilevel analysis. Although aiming to study SEs for multilevel SEM, we only explicitly consider the two-level model; generalization from a two-level model to a higher level model can be performed in a parallel way. Our study for SEs will be based on a quite general model formulated in [16]. Within this model, we will study the effect of skewness and kurtosis on SEs for estimates of parameters that are shared by the mean, the betweenand within-level covariance structures. We also study the effect of skewness and kurtosis on SEs for estimates of parameters that are unique to the mean, the between-level covariance matrix, and the within-level covariance matrix. In Section 2 we characterize the effect of skewness (third-order moments) and kurtosis (fourth-order moments) on SEs of parameter estimates at either level. By focusing on a class of nonnormal distributions in Section 3, we study the asymptotic robustness of SEs for parameter estimates in a two-level factor analysis model. We will state and discuss the results in Sections 2 and 3. Proofs of these results will be provided in the appendix. 2. The effect of skewness and Kurtosis on standard errors Any multilevel data must have a hierarchical structure. In addition, real data may contain additional variables that are only observed at the highest level. For example, while scores of student achievement are nested within schools, variables of school resources such as the number of computer labs or the number of elective courses are only measured at the school level. The two-level SEM model studied in [34] does not contain variables that are only observed at level-2. Liang and Bentler [16] formulated a more general model that contains both level-1 and level-2 observed variables. This model can be expressed as ( ) ( ) ( ) z z 0 = +, i = 1, 2,...,n y i v u ; = 1, 2,...,, (1) i where the observable variables are on the left side and the hypothesized generating variables are on the right side. There are two types of observable variables: The q-dimensional vector z varies at level-2 (between) only, and the p-dimensional vector y i varies at both level- 1 (within) and level-2. The vector u i represents the level-1 components of y i, while v represents the level-2 components; both are p-dimensional. With model (1) it is typically assumed that b = (z, v ) and u i are statistically independent. Following Liang and Bentler s [16] setup, we assume E(u i ) = 0. Denote ( ) ( ) ( ) ( ) z µz z Σzz Σ µ = E =, Σ v µ b = Cov = zv, Σ v v Σ vz Σ w = Cov(u i ). vv

1124 K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 Let ȳ. = n i=1 y i /n. Then the covariance matrix of t = (z, ȳ. ) is ( Σzz Σ zv Σ = Cov(t ) = Σ vz Σ vv + n 1 Σ w ). With interesting structures µ(θ), Σ b (θ) and Σ w (θ), 2 times the normal distribution based log likelihood function of (1) is (see [16,35]) where N = n 1 + +n, with S y = 1 N l(θ) = (N ){log Σ w (θ) +tr[σ 1 w (θ)s y]} + {log Σ (θ) +tr[σ 1 (θ)r (θ)]}, (2) n S =1 =1 S = 1 n n (y i ȳ. )(y i ȳ. ) i=1 and R (θ) =[t µ(θ)][t µ(θ)]. Let ˆθ be the parameter value that minimizes (2). Liang and Bentler [16] gave an EMalgorithm for obtaining ˆθ. Yuan and Bentler [35] provided alternative statistics for overall model evaluation. However, with nonnormal data, formulae for obtaining a consistent covariance matrix of parameter estimates for this model have not been provided in the literature. In this section, we will first present the asymptotic distribution of ˆθ and a consistent estimator of its covariance matrix before relating the covariance matrix to skewness and kurtosis. We will implicitly assume the standard regularity conditions as in [38], which ensures the consistency and asymptotically normality of ˆθ. The consistency and asymptotic normality hold in general only when. Special results can be obtained when the average level-1 sample size n = N/ also approaches infinity, which will be explicitly stated. For a p p symmetric matrix A, let vec(a) be the vectors of stacking the columns of A and vech(a) be the subvector of vec(a) that only contains the elements on and below the diagonal of A. Then there is a duplication matrix D p such that vec(a) = D p vech(a) [19]. Notice that Σ is a matrix of dimension p + q. Denote s y = vech(s y ), s = vech(s ), r (θ) = vech[r (θ)], σ w = vech(σ w ), σ b = vech(σ b ), σ = vech(σ ), W w = 2 1 D p (Σ 1 w Σ 1 w )D p, W = 2 1 D (p+q) (Σ 1 Σ 1 )D (p+q).

K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 1125 We will use a dot on top of a function to denote the derivative. For example, µ(θ) = dµ(θ)/dθ. The argument of a function will be omitted when it is evaluated at the population value. The notation L implies convergence in distribution. Let B (1) A (1) = 1 = ( n 1) σ w W w σ w + 1 + 1 + 1 + 1 σ W σ + 1 =1 σ w W wvar(n s )W w σ w + 1 =1 =1 µ, (3) σ W Var(r )W σ + 1 =1 σ w W wcov(n s, r )W σ + 1 =1 =1 =1 σ w W wcov(n s, t )Σ 1 µ + 1 σ W Cov(r, t )Σ 1 µ + 1 =1 =1 σ W Cov(r,n s )W w σ w =1 =1 Cov(t,n s )W w σ w µ Cov(t, r )W σ (4) and Ω (1) = A (1) 1 (1) B A(1) 1. (5) The following result characterizes the asymptotic distribution of ˆθ. Theorem 1. When Ω (1) = lim Ω (1) exists, we have (ˆθ θ 0 ) L N(0, Ω (1) ). (6) A consistent estimator of Ω (1) is given by ˆΩ (1) =  (1) 1 are obtained when replacing the unknown parameters in A (1) and B (1) Var(r ),Cov(n s, r ),Cov(n s, t ) and Cov(r, t ) in B (1) Var(n s ) =[n s (n 1)ˆσ w ][n s (n 1)ˆσ w ], Var(r ) =[r ( ˆµ) ˆσ ][r ( ˆµ) ˆσ ], Ĉov(n s, r ) =[n s (n 1)ˆσ w ][r ( ˆµ) ˆσ ], Ĉov(n s, t ) =[n s (n 1)ˆσ w ](t ˆµ), Ĉov(r, t ) =[r ( ˆµ) ˆσ ](t ˆµ). ˆB (1)  (1) 1, where  (1) and ˆB (1) by ˆθ and Var(n s ), by, respectively, Note that the consistency of ˆΩ (1) is with respect to while the n s are uniformly bounded. The distribution in Theorem 1 becomes degenerate when n s are unbounded

1126 K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 such that n. We will discuss this particular case in the following subsections. More elaborate estimators of Ω (1) for a less general two-level SEM were discussed in [34]. 2.1. For parameters that are shared by µ(.), Σ b (.) and Σ w (.) The model parameterization in (2) does not distinguish parameters in different parts of the model. In practice, parameters in µ(.), σ b (.) and σ w (.) may not totally overlap, θ contains parameters that are shared by µ(.), σ b (.) and σ w (.) and those that are not shared by them. Our characterization of SEs in this subsection is only for estimates of parameters that are shared by µ(.), σ b (.), σ w (.), and we will refer them as the common parameters. Note that ( ) ( ) 0q 1 0q p = u u i I i = Cu i. p Let b 0 = b µ 0 and denote Δ b = E[b 0 vech (b 0 b 0 )], Γ b = Var[vech(b 0 b 0 )], Δ u = E[u i vech (u i u i )], Γ u = Var[vech(u i u i )], Δ cu = E{(Cu i )vech (u i u i )}=CΔ u, Γ cu = Cov{vech[(Cu i )(Cu i ) ], vech(u i u i )}=D+ p+q (C C)D pγ u, Δ cuc = E{(Cu i )vech [(Cu i )(Cu i ) ]} = CΔ u D p (C C )D + p+q, Γ cuc = Var{vech[(Cu i )(Cu i ) ]} = D + p+q [(C C)D pγ u D p (C C )]D + p+q, V b = 2D + p+q (Σ b Σ b )D + p+q, V w = 2D + p (Σ w Σ w )D + p, V = 2D + p+q (Σ Σ )D + p+q, V cw = 2D + p+q (CΣ w) (CΣ w )D + p, V cwc = 2D + p+q (CΣ wc ) (CΣ w C )D + p+q. It can be easily verified that W w = Vw 1, W = V 1, and for normal data Γ b, Γ u, Γ cu and Γ cuc reduce to V b, V w, V cw and V cwc, respectively. Relating the covariance matrix in (5) to the third- and fourth-order moment matrix, we have the following result. Theorem 2. The covariance matrix Ω (1) in (5) can be decomposed as Ω (1) = A (1) 1 (1) 1 (1) + A E A(1) 1, (7) where A (1) given in (3) corresponds to the information matrix based on the normal distribution assumption and E (1) = n 2 + 1 1 σ w W w(γ u V w )W w σ w n =1

A (1) K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 1127 + 1 + 1 [ ] σ W (Γ b V b ) + 1 n 3 (Γ cuc V cwc ) W σ =1 (n 1) { σ w W w(γ cu V cw )W σ + σ W (Γ cu V cw )W w σ w } =1 n 2 + 1 ) (1 1n { Δ cu W w σ w + σ w W wδ cu Σ 1 µ} =1 { ( ) + 1 Δ b + 1 n 2 Δ cuc W σ =1 ( ) } + σ W Δ b + 1 n 2 Δ cuc Σ 1 µ. (8) Note that when data are normally distributed, E (1) = 0, Ω (1) = lim A (1) 1. The matrix does not depend on the underlying distributions of z and y i. Nonnormal data affect Ω (1) by E (1) through skewnesses Δ b, Δ u and kurtoses Γ b, Γ u. Specifically, the kurtosis Γ u of the within-level components u i affects Ω (1) mainly through the first term in (8). This influence is also accelerated by the average level-1 sample size n. The kurtosis Γ u of u i also appears in the second and third terms in (8), but its effect is downplayed by the level-1 sample sizes n. The kurtosis Γ b of the between-level components b affects Ω (1) by the second term in (8). The skewnesses Δ u and Δ b of the within- and between-level affect Ω (1) through the fourth and fifth terms in (8). It follows from Theorem 2 that the additional variance covariance matrix of the common parameter estimate ˆθ is proportional to the level-1 and level-2 kurtoses as well as level-1 and level-2 skewnesses. The above conclusion is based on and a small n. When n is large, we can use n as an approximation to obtain a more elegant conclusion. Denote by O(a n ) a matrix whose elements are of order O(a n ) (see [3, Chapter 14]). It follows from (3), (7) and (8) that A (1) = O( n), Ω (1) = O( n 1 ). We need to restandardize (6) so that the limiting distribution is not degenerate when n. Let Ω (2) = nω (1), then the asymptotic covariance matrix of N(ˆθ θ 0 ) is Ω (2) = lim Ω (2) and we have the following result. Corollary 1. There exist Ω (2) = (A (1) / n) 1 + (A (1) / n) 1 σ w W w(γ u V w )W w σ w (A (1) / n) 1 + O( n 1 ) (9a) and Ω (2) = ( σ w W w σ w ) 1 + ( σ w W w σ w ) 1 σ w W w(γ u V w )W w σ w ( σ w W w σ w ) 1. (9b)

1128 K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 The matrix (A (1) / n) 1 is the covariance matrix of N(ˆθ θ 0 ) corresponding to the normal distribution assumption. The second term in (9a) or (9b) is due to nonnormal data. So for large n the larger SEs of the common parameter estimates ˆθ are mainly caused by Γ u, the within-level kurtosis. 2.2. For parameters that are not shared by µ(.), Σ b (.) and Σ w (.) To facilitate the study of SEs of estimates for parameters that are not shared by µ(.), Σ b (.) and Σ w (.), we assume their parameters are totally separated. That is θ = (θ m, θ b, θ w ) and the structures are µ(θ m ), Σ b (θ b ) and Σ w (θ w ). Denote W c = 2 1 D p (C Σ 1 ) (C Σ 1 )D p+q and W c c = 2 1 D p (C Σ 1 C) (C Σ 1 C)D p. Let ˆθ = (ˆθ m, ˆθ b, ˆθ w ) minimizes (2); where where A (3) = A mm = 1 A bb = 1 A mm 0 0 0 A bb A bw 0 A wb A ww =1 =1, (10a) µ, (10b) σ b W σ b, A bw = 1 n 1 σ b W c σ w, (10c) =1 A wb = A bw, A ww = ( n 1) σ w W w σ w + 1 E (3) = E mb = 1 0 E mb E mw E bm E bb E bw, E wm E wb E ww =1 E mw = 1 =1 n 2 σ w W c c σ w, (10d) ( ) Δ b + 1 n 2 Δ cuc W σ b, (11a) ) (1 1n =1 + 1 =1 1 n Δ cu W w σ w ( ) Δ b + 1 n 2 Δ cuc W c σ w, (11b)

K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 1129 [ ] E bm = E mb, E bb = 1 σ b W (Γ b V b ) + 1 n 3 (Γ cuc V cwc ) W σ b, =1 (11c) E bw = 1 (n 1) σ b W (Γ cu V cw )W w σ w =1 n 2 + 1 =1 [ ] 1 σ b n W (Γ b V b ) + 1 n 3 (Γ cuc V cwc ) W c σ w, (11d) E wm = E mw, E wb = E bw, E ww = n 2 + 1 1 σ w W w(γ u V w )W w σ w n =1 + 1 1 n 2 =1 [ ] σ w W c (Γ b V b ) + 1 n 3 (Γ cuc V cwc ) W c σ w + 1 (n 1) [ σ w W w(γ cu V cw )W c σ w =1 n 3 + σ w W c (Γ cu V cw )W w σ w ] (11e) and = A (3) 1 (3) + A Ω (3) 1 (3) E A(3) The following theorem characterizes the distribution of ˆθ. 1. (12) Theorem 3. When Ω (3) = lim Ω (3) exists, we have (ˆθ θ 0 ) L N(0, Ω (3) ). (13) Note that the A (3) in (10a) corresponds to normal distribution based information matrix. It follows from Theorem 3 and (12) that the larger SEs of ˆθ are caused by E (3). When data are normally distributed, E (3) = 0, SEs based on ˆΩ (3) = Â (3) 1 are asymptotically correct. Many terms in (10) and (11) involve 1 =1 n 1. To better understand the extra SEs

1130 K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 caused by E (3), we need to properly quantify the magnitude of 1 =1 n 1. For such a purpose, we assume that the n s are evenly distributed on [n a + 1,n a + ]. Wehavethe following result. Lemma 1. Let 1 = Then A (3) A mm 0 0 0 A bb A bw. 0 A wb A ww A mm = A 1 mm = O(1), A bb = A 1 bb + O[ n 1 2 ln 2 (1 + /n a )], A 1 bb = ( σ b W b σ b ) 1 + O[ 1 ln(1 + /n a )], A ww = A 1 ww + O[ n 2 2 ln 2 (1 + /n a )], A 1 ww = ( n 1) 1 ( σ w W w σ w ) 1 + O( n 2 1 ), A bw = O[ n 1 1 ln(1 + /n a )]. (14a) (14b) (14c) (14d) (14e) (14f) Lemma 1 implies that, for normally distributed data, ˆθ b and ˆθ w are approximately independent when either n or is large. It also implies that, as n, the SEs of (ˆθ w θ w0 ) approach zero for normally distributed data. As we shall see, this is also true in general for nonnormally distributed data. In order to better study the effect of skewness and kurtosis on the SEs of the within-level parameter estimates ˆθ w, we further let ˆγ = (ˆθ m, ˆθ b,( n 1)1/2 ˆθ w ) and G = diag(i m, I b,( n 1) 1/2 I w ), where I m, I b, I w are identity matrices of proper dimensions. Let then = GΩ (3) G = GA(3) 1 G + H, Ω (4) GA (3) 1 G = H = GA (3) A mm 0 0 0 A bb ( n 1) 1/2 A bw, 0 ( n 1) 1/2 A wb ( n 1)A ww 1 G = 1 (3) E A(3) H mm H mb H mw H bm H bb H bw. H wm H wb H ww The asymptotic distribution of ˆγ is characterized by the following result. (15a) (15b) (15c) Theorem 4. Let Ω (4) = lim Ω (4), we have (ˆγ γ0 ) L N(0, Ω (4) ), (16)

K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 1131 and there exist H mm = 0, H mb = A 1 mm 1 =1 Δ b W σ b A bb + O(1/ ), (17a) H mw = ( n 1) 1/2 A 1 mm 1 =1 Δ cu W w σ w A ww + O[ n 1/2 1 ln(1 + /n a )], (17b) H bb = A 1 bb 1 σ b W (Γ b V b )W σ b A 1 bb =1 + O[ n 1 2 ln 2 (1 + /n a )], (17c) H bw = O[ n 1/2 1 ln(1 + /n a )], (17d) H ww = ( n 1)( n 2)A 1 ww [ σ w W w(γ u V w )W w σ w ]A 1 ww + O[ n 1 1 ln(1 + /n a )]. (17e) Because SEs of (ˆγ γ 0 ) using ˆΩ (4) = GÂ (3) 1 G are the normal distribution based procedure, H mm = 0 in (17a) implies that the skewness or kurtosis of between- or withinlevel random components does not affect SEs of ˆθ m. It also follows from (17a) that Δ b is mainly responsible for the covariances between ˆθ m and ˆθ b. Eq. (17b) implies that, when n is small and is large, Δ u is responsible for the covariances between ˆθ m and ˆθ w. It follows from (14d) and (14e) that A ww = O( n 1 ), (17b) implies that ˆθ m and ( n 1)ˆθ w are not correlated when n approaches infinity even when Δ u = 0. It follows from (17c) that, when n or approaches infinity, the within-level kurtosis has no effect on the SEs of the between-level parameter estimates. Eq. (17d) implies that ˆθ b and ( n 1)ˆθ w are asymptotically independent when either or n approaches infinity. It follows from (17e) that Γ u is mainly responsible for the larger SEs of the within-level parameter estimates ˆθ w. When n or approaches infinity, the between-level kurtosis has no effect on the SEs of the within-level parameter estimates. 3. Asymptotic robustness of standard errors In order to characterize the asymptotic robustness of SEs we first introduce a class of nonnormal distributions given by Yuan and Bentler [33]. Our study of the asymptotic SEs is based on this class of distributions. Properties of SEs when the between- or within-level random components follow the class of elliptical distributions are a special case of the more general results presented below.

1132 K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 Let ξ 1,...,ξ m be independent random variables with E(ξ ) = 0, E(ξ 2 ) = 1, E(ξ3 ) = ζ, E(ξ 4 ) = κ, and ξ = (ξ 1,...,ξ m ). Let r be a random variable which is independent of ξ, E(r 2 ) = 1, E(r 3 ) = γ, and E(r 4 ) = τ. Also, let m d and L = (l i ) = (l 1,...,l m ) be a d m matrix of rank d such that LL = Σ, where l = (l 1,...,l d ). Then the random vector x = rlξ (18) generally follows a nonnormal distribution. Different distributions are obtained by choosing a different set of ξ s, L and r. It is easy to see that the population covariance matrix of x is given by Σ. Yuan and Bentler [33] obtained the fourth-order moment matrix Γ = Cov[vech(xx )] as Γ = 2τD + p (Σ Σ)D+ p + (τ 1)σσ + τ m (κ 3)vech(l l )vech (l l ). (19) When all the κ s equal 3, then the Γ in (19) reduces to that corresponding to an elliptical distribution (see [4]).Yuan and Bentler [33] called the corresponding distribution of x in (18) a pseudo-elliptical distribution, since it is no longer symmetric. When τ = 1 in addition to κ = 3, the corresponding distribution of x in (18) was called a pseudo-normal distribution. It was noted by [33] that for a given matrix L, different marginal skewnesses will not affect the Γ matrix in (19). We assume that the between-level random vector b 0 follows (18) with d = p + q and has a fourth-order moment matrix Γ b = τ b V b + (τ b 1)σ b σ b + τ b m b (κ (b) =1 =1 3)vech(l (b) l (b) )vech (l (b) l (b) ). (20) We can only generalize the asymptotic robustness property of SEs from conventional SEM to the multilevel SEM context when all the parameters are separated as in Section 2.2. Let Ω mm Ω mb Ω mw Ω (4) = Ω bm Ω bb Ω bw. Ω wm Ω wb Ω ww It follows from (15) that Ω bb = A bb + H bb. Combining (14b), (14c) and (17c) yields Ω bb = ( σ b W b σ b ) 1 ( σ b W bγ b W b σ b )( σ b W b σ b ) 1 + O[ 1 ln(1 + /n a )]. (21) Suppose the between-level covariance structure Σ b (θ b ) has q b parameters with θ b = (θ b1, θ b2,...,θ bqb ). Let R( σ b ) be the space spanned by the column vectors of σ b.we need the following condition for asymptotic robustness: Condition B. For each of the = 1,...,m b, vech(l (b) l (b) ) R( σ b ).

K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 1133 Note that Condition B implies that there exist vectors c () b = (c () b1,c() b2,...,c() bq b ) such that vech(l (b) l (b) ) = σ b c () b. Because Σ b = m b l (b) l (b), Condition B also implies σ b =1 R( σ b ). Thus there exists a vector c b = (c b1,c b2,...,c bqb ) such that σ b = σ b c b. Consequently, we can rewrite (21) as Ω bb = τ b ( σ b W b σ b ) 1 + (τ b 1)c b c b m b + τ b (κ (b) 3)c () b c() b + O[ 1 ln(1 + /n a )]. (22) =1 When κ (b) = 3, the Γ b in (20) is identical to that when b follows an elliptical symmetric distribution. So, Ω bb = lim Ω bb corresponding to a pseudo-elliptical distribution is exactly the same as that corresponding to an elliptical distribution. When τ b = 1 and κ (b) = 3, Ω bb = ( σ b W b σ b ) 1. This indicates that, when is large, SEs for ˆθ b based on the normal distribution assumption can be used for a skewed data set sampled from a pseudo-normal distribution. For some models, c () b1 = = c() br = 0, = 1,...,m b and c b1 = = c br = 0 hold from the model hypothesis. This simplifies the upper left r r submatrix of Ω bb to Ω (r) bb = τ b[( σ b W b σ b ) 1 ] (r). Consider the confirmatory factor model for the between-level components and where b = µ + Λ b f b + ε b Σ b (θ b ) = Λ b Φ b Λ b + Ψ b, Λ b = λ b1 0 0 0... 0 0 0 λ bkb and all the λ b s are vectors so that each between-level random component only measures one factor, Φ b = Cov(f b ), and Ψ b = Cov(ε b ) = diag(ψ b11,...,ψ b(p+q)(p+q) ). Let the model be identified by fixing one of the factor loadings at 1.0 for each factor. When data are generated by (18) with L = L b = (Λ b Φ 1/2 b, Ψ 1/2 b ), then m b = (p + q) + k b in (20). For such a setup, Yuan and Bentler [32] showed that Condition B is satisfied and the asymptotic covariance matrix for estimates of the r = p + q k b free factor loadings is τ b [( σ b W b σ b ) 1 ] (r). When τ b = 1, data generated by (18) can have arbitrary skewness and kurtosis. But SEs of the between-level factor loading estimates provided by the normal distribution assumption are still valid when. This

1134 K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 extends the result of Anderson and Amemiya [1], Shapiro and Browne [30], Satorra and Bentler [28] andyuan and Bentler [32] from the context of conventional confirmatory factor analysis to the between-level factor model within a two-level SEM framework. Note that the within-level structure Σ w (θ w ) can be any identifiable structure which does not interfere with the asymptotic robustness of the between-level SEs. Similarly, the normal distribution based SEs for the within-level parameter estimates ˆθ w can also be asymptotically robust. It follows from (15), (14d), (14e) and (17e) that Ω ww = ( σ w W w σ w ) 1 ( n 2) + ( n 1) ( σ w W w σ w ) 1 [ σ w W w(γ u V w )W w σ w ]( σ w W w σ w ) 1 + O[ n 1 1 ln(1 + /n a )]. (23) Suppose the within-level vector u i is generated by (18) and has a fourth-order moment matrix Γ w = τ w V w + (τ w 1)σ w σ w m w +τ w (κ (w) 3)vech(l (w) =1 l (w) )vech (l (w) l (w) ). (24) Parallel to Condition B, we need the following condition for the asymptotic robustness at the within-level. Condition W. For each of the = 1,...,m w, vech(l (w) l (w) ) R( σ w ) When Condition W is satisfied, there exist c () w Ω ww = ( σ w W w σ w ) 1 + m w + τ w (κ (w) =1 and c w such that ( n 2) ( n 1) (τ w 1)[( σ w W w σ w ) 1 + c w c w ] 3)c () w c() w + O[ n 1 1 ln(1 + /n a )]. (25) When κ (w) = 3, the Ω ww = lim Ω ww is identical to that when u i follows an elliptical symmetric distribution. So, when n or, the Ω ww matrix corresponding to u i following a pseudo-elliptical distribution is exactly the same as that corresponding to u i following an elliptical distribution. When τ w = 1 and κ (w) = 3, Ω ww = ( σ w W w σ w ) 1. This indicates that, when or n is large, SEs for ˆθ w based on the normality assumption can be used for skewed data sampled from a pseudo normal distribution. When Σ w = Λ w Φ w Λ w + Ψ w and u i can be represented by (18) with L = L w = (Λ w Φw 1/2, Ψ1/2 w ), the SEs for the factor loading estimates based on the normality assumption are asymptotically valid for many nonnormal distributions with heterogeneous skewness and kurtoses. So the results of Anderson and Amemiya [1], Shapiro and Browne [30], Satorra and Bentler [28] and Yuan and Bentler [32] are still true for within-level factor analysis models in a two-level SEM regardless of the between-level structure.

4. Conclusions K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 1135 Motivated by typical nonnormal data in social and behavioral sciences and the increasingly popular normal distribution based multilevel methodology, we study how skewness and kurtosis in one level affect SEs of parameter estimates within its level and outside its level. To facilitate the study with different level-1 sample sizes we have assumed that the level-1 sample sizes n s are evenly distributed. For parameters that are shared by the mean, the between- and the within-level covariance structures, the effect of skewnesses and kurtoses on SEs of their estimates depends on the average level-1 sample size n = N/. When n is small, both the between- and the within-level skewnesses and kurtoses affect the SEs linearly. When n is large, however, it is mainly the within-level kurtosis that affects the SEs of the common parameter estimates. For parameters that are unique to the mean, the between-level covariance structure and the within-level covariance structure, effects of skewness and kurtosis are different on different parameter estimates. First, skewness or kurtosis at either level does not affect the asymptotic SEs of the mean parameter estimates. The estimates of the mean parameters and parameters of the between-level covariance structure are asymptotically correlated in general; this correlation is determined by the between-level skewness. The estimates of the mean parameters and those of parameters of the within-level covariance structure are asymptotically independent when n but not when. When either or n, the within-level kurtosis has no effect on the SEs of the between-level parameter estimates. Similarly, when or n, the between-level kurtosis has no effect on the SEs of the within-level parameter estimates. When either or n, the between-level and within-level covariance parameter estimates are asymptotically independent. We also showed that, parallel to the asymptotic robustness of SEs in the conventional SEM model, asymptotic robustness may exist for SEs in a multilevel factor analysis model. For example, under proper conditions, SEs of factor loading estimates at both the betweenand within-level are asymptotically robust. Unfortunately, the same limitation as with conventional SEM model holds: The results of asymptotic robustness may only be observed when n or are large enough, and the asymptotic robustness conditions B and W are not verifiable. When data cannot be represented by (18), asymptotic robustness may not hold. In practice, then, it seems more appropriate to compute SEs using (6) than to accept normal theory SEs in the hope that asymptotic robustness would resolve issues related to nonnormality. In the study we have assumed that the within-level random component vector u i s, i = 1,...,n, follow the same distribution for all the clusters = 1,...,. It is possible that u i s are distributed differently for different. Additional study in this direction is needed and will provide more detailed results for SEs. Acknowledgements We are thankful to a referee and an editor for their comments that have led the paper to a significant improvement over the previous version.

1136 K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 Appendix A. This appendix will provide the proofs for Theorems 1 4, Corollary 1 and Lemma 1. Proof of Theorem 1. On differentiating the l(θ) in (2) we obtain dl(θ) = (N )tr{σ 1 w (θ)[s y Σ w (θ)]σ 1 w (θ)[dσ w(θ)]} tr{σ 1 (θ)[r (θ) Σ (θ)]σ 1 [dσ (θ)]} 2 =1 [dµ (θ)]σ 1 (θ)[t µ(θ)]. (A.1) =1 It follows from (A.1) that the normal estimating function corresponding to l(θ) is g(θ) = ( n 1) σ w (θ)w w(θ)[s y σ w (θ)] + 1 σ (θ)w (θ)[r (θ) σ (θ)] + 1 =1 =1 µ (θ)σ 1 (θ)[t µ(θ)]. Applying the Taylor expansion on g(ˆθ) = 0 at θ 0 leads to (ˆθ θ 0 ) = ġ 1 (θ 0 ) g(θ 0 ) + o p (1). Theorem 1 directly follows from (A.2) by noticing that A (1) = E[ġ(θ 0 )] and B (1) Var[g(θ 0 )]. (A.2) = Proof of Theorem 2. Following the outline given in the appendix ofyuan and Bentler [36] we have ) Var(n s ) = (n 2 + 1n Γ u + 2(n 1) D + p n (Σ w Σ w )D + p ) = (n 1)V w + (n 2 + 1n (Γ u V w ), (A.3) Var(r ) = Γ b + 1 n 3 Γ cuc + 2 n D + p+q [(CΣ wc ) Σ b + Σ b (CΣ w C )]D + p+q + 2(n 1) n 3 D + p+q [(CΣ wc ) (CΣ w C )]D + p+q = V + (Γ b V b ) + 1 n 3 (Γ cuc V cwc ), (A.4) n Cov(r, s ) = (n 1) n 2 D + p+q (C C)[D pγ u D p 2(Σ w Σ w )]D + p

K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 1137 n Cov(t, s ) = = (n 1) n 2 (Γ cu V cw ), (A.5) ) (1 1n Δ cu, (A.6) Cov(t, r ) = Δ b + 1 n 2 Δ cuc. (A.7) Combining (4) with (A.3) (A.7) leads to B (1) = A (1) + E (1). Theorem 2 follows from (5) and (A.8). (A.8) Proof of Corollary 1. When n, it follows from (8) that E (1) = n σ w W w(γ u V w )W w σ w + O(1). (A.9) Because A (1) 1 = O( n 1 ), the corollary follows from (7) and (A.9). Proof of Theorem 3. The estimating function corresponding to the parameterization θ = (θ m, θ b, θ w ) is h 1 (θ) h(θ) = h 2 (θ), h 3 (θ) where h 1 (θ) = 1 h 2 (θ) = 1 =1 µ (θ m )Σ 1 (θ)[t µ(θ m )], σ b (θ b)w (θ)[r (θ m ) σ (θ)], =1 h 3 (θ) = 1 + 1 σ w (θ w)w w (θ w )[n s (n 1)σ w (θ w )] =1 =1 1 n σ w (θ w)w c (θ)[r (θ m ) σ (θ)]. Because ˆθ satisfies h(ˆθ) = 0, its asymptotic distribution is characterized by (see [38]) (ˆθ θ 0 ) L N(0, Ω (3) ),

1138 K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 where Ω (3) = lim ḣ 1 (θ 0 ){ Var[h(θ 0 )]}ḣ 1 (θ 0 ). It follows from (A.3) to (A.7) that Var[h(θ 0 )]=A (3) + E (3). The theorem follows by noticing that A (3) = E[ḣ(θ 0 )]. Proof of Lemma 1. Because n s are evenly distributed on [n a + 1,n a + ], there exists O 1 n 1 = O 1 (n a + ) 1 =1 =1 = O ( 1 = O 1 dx ) n a + x ] [ 1 ln(1 + /n a ). (A.10) It follows from (10) that A mm = O(1), A bb = O(1), A bw = O[ 1 ln(1 + /n a )] and A ww = O( n). Applying the rule of matrix inversion for partitioned matrices to (10a), we have A mm = A 1 mm = O(1), A ww = (A ww A wb A 1 bb A bw) 1 ={A ww O[ 2 ln 2 (1 + /n a )]} 1 = A 1 ww + O[ n 2 2 ln 2 (1 + /n a )], A bb = A 1 bb + A 1 bb A bwa ww A wb A 1 bb = A 1 bb + O[ n 1 2 ln 2 (1 + /n a )], A bw = A 1 bb A bwa ww = A 1 bb A bwa 1 ww + O[ n 2 3 ln 3 (1 + /n a )] = O[ n 1 1 ln(1 + /n a )]. Because Σ = Σ b + n 1 CΣ w C = Σ b + O(n 1 ), ( 1 W = W b + O n Eq. (10c) implies that ). A bb = σ b W b σ b + O[ 1 ln(1 + /n a )], and thus (14c) follows. Eq. (10d) implies that A ww = ( n 1) σ w W w σ w + O( 1 ), and thus (14e) follows.

K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 1139 Proof of Theorem 4. Direct matrix multiplication in (15c) leads to H mm = 0, H mb = A 1 mm (E mba bb + E mw A wb ), H mw = n 1/2 A 1 mm (E mba bw + E mw A ww ), H bb = A bb (E bb A bb + E bw A wb ) + A bw (E wb A bb + E ww A wb ), (A.11a) (A.11b) H ww = ( n 1)A wb (E bb A bw + E bw A ww ) + ( n 1)A ww (E wb A bw + E ww A ww ), (A.11c) H bw = ( n 1) 1/2 A bb (E bb A bw + E bw A ww ) + ( n 1) 1/2 A bw (E wb A bw + E ww A ww ), (A.11d) H bm = H mb, H wm = H mw, H wb = H bw. Combining (A.10) and (11a) (11e) yields E mb = 1 E mw = 1 =1 =1 Δ b W σ b + O(1/ ), (A.12a) Δ cu W w σ w + O[ 1 ln(1 + /n a )], (A.12b) E bb = 1 σ b W (Γ b V b )W σ b + O(1/ ), =1 E bw = O[ 1 ln(1 + /n a )] (A.12c) (A.12d) and E ww = ( n 2) σ w W w(γ u V w )W w σ w + O[ 1 ln(1 + /n a )]. (A.12e) It follows from Lemma 1, (A.11a) and (A.12a) that H mb = A 1 mm 1 Δ b W σ b A bb + O(1/ ). =1 Combining Lemma 1, (A.11a) and (A.12b) yields H mw = ( n 1) 1/2 A 1 1 mm =1 + O[ n 1/2 1 ln(1 + /n a )]. Δ cu W w σ w A ww

1140 K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 It follows from Lemma 1, (A.11b) and (A.12) that H bb = A 1 bb 1 σ b W (Γ b V b )W σ b A 1 bb + O[ n 1 2 ln 2 (1 + /n a )]. =1 It follows from Lemma 1, (A.11c) and (A.12) that H ww = ( n 1)A ww E ww A ww + O[ n 1 2 ln 2 (1 + /n a )] = ( n 1)( n 2)A 1 ww [ σ w W w(γ u V w )W w σ w ]A 1 ww + O[ n 1 1 ln(1 + /n a )]. Finally, Lemma 1, (A.11d) and (A.12) imply H bw = O[ n 1/2 1 ln(1 + /n a )]. References [1] T.W. Anderson,Y. Amemiya, The asymptotic normal distribution of estimators in factor analysis under general conditions, Ann. Statist. 16 (1988) 759 771. [2] P.M. Bentler,. Liang, Two-level mean and covariance structures: maximum likelihood via an EM algorithm, in: S. Reise, N. Duan (Eds.), Multilevel Modeling: Methodological Advances, Issues, and Applications, Erlbaum, Mahwah, N, 2003, pp. 53 70. [3] Y.M.M. Bishop, S.E. Fienberg, P.W. Holland, Discrete MultivariateAnalysis: Theory and Practice, MIT Press, Cambridge, 1975. [4] M.W. Browne, Asymptotic distribution-free methods for the analysis of covariance structures, British. Math. Statist. Psychol. 37 (1984) 62 83. [5] Y.F. Cheong, R.P. Fotiu, S.W. Raudenbush, Efficiency and robustness of alternative estimators for two- and three-level models: the case of NAEP,. Educational Behav. Statist. 26 (2001) 411 429. [6] S. du Toit, M. du Toit, Multilevel structural equation modeling, in:. de Leeuw (Ed.), The Analysis of Multilevel Models, Springer, New York, pp. 273 322, in press. [7] K.-T. Fang, K.W. Kotz, Symmetric Multivariate and Related Distributions, Chapman & Hall, London, 1990. [8] H. Goldstein, Multilevel Statistical Models, second ed., Arnold, London, 2003. [9] H. Goldstein, R.P. McDonald, A general model for the analysis of multilevel data, Psychometrika 53 (1988) 435 467. [10] R.H. Heck, S.L. Thomas, An Introduction of Multilevel Modeling Techniques, Erlbaum, Mahwah, N, 2000. [11].. Hox, Multilevel Analysis: Techniques and Applications, Erlbaum, Mahwah, N, 2002. [12] Y. Kano, Consistency property of elliptical probability density functions,. Multivariate Anal. 51 (1994) 139 147. [13] I. Kreft,. de Leeuw, Introducing Multilevel Modeling, Sage, London, 1998. [14] S.-Y. Lee, Multilevel analysis of structural equation models, Biometrika 77 (1990) 763 772. [15] S.-Y. Lee, W.-Y. Poon, Analysis of two-level structural equation models via EM type algorithms, Statist. Sinica 8 (1998) 749 766. [16]. Liang, P.M. Bentler, An EM algorithm for fitting two-level structural equation models, Psychometrika 69 (2004) 101 122. [17] N.T. Longford, A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects, Biometrika 74 (1987) 817 827. [18] N.T. Longford, Regression analysis of multilevel data with measurement error, British. Math. Statist. Psychol. 46 (1993) 301 311. [19].R. Magnus, H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics, revised edition, Wiley, New York, 1999.

K.-H. Yuan, P. M. Bentler / ournal of Multivariate Analysis 97 (2006) 1121 1141 1141 [20].. McArdle, F. Hamagami, Multilevel models from a multiple group structural equation perspective, in: G.A. Marcoulides, R.E. Schumacker (Eds.), Advanced Structural Equation Modeling Techniques, Erlbaum, Mahwah, N, 1996, pp. 89 124. [21] R.P. McDonald, H. Goldstein, Balanced versus unbalanced designs for linear structural relations in two-level data, British. Math. Statist. Psychol. 42 (1989) 215 232. [22] T. Micceri, The unicorn, the normal curve, and other improbable creatures, Psychol. Bull. 105 (1989) 156 166. [23] B. Muthén, Multilevel covariance structure analysis, Sociol. Methods Res. 22 (1994) 376 398. [24] B. Muthén, Latent variable modeling of longitudinal and multilevel data, in: A. Raftery (Ed.), Sociological Methodology, Blackwell Publishers, Boston, 1997, pp. 453 480. [25] B. Muthén, A. Satorra, Complex sample data in structural equation modeling, in: P.V. Marsden (Ed.), Sociological Methodology 1995, Blackwell Publishers, Cambridge, MA, 1995, pp. 267 316. [26] W.-Y. Poon, S.-Y. Lee, A distribution free approach for analysis of two-level structural equation model, Comput. Statist. Data Anal. 17 (1994) 265 275. [27] S.W. Raudenbush, A.S. Bryk, Hierarchical Linear Models, second ed., Sage, Newbury Park, 2002. [28] A. Satorra, P.M. Bentler, Corrections to test statistics and standard errors in covariance structure analysis, in: A. von Eye, C.C. Clogg (Eds.), Latent Variables Analysis: Applications for Developmental Research, Sage, Thousand Oaks, CA, 1994, pp. 399 419. [29].L. Schafer,.W. Graham, Missing data: our view of the state of the art, Psychol. Methods 7 (2002) 147 177. [30] A. Shapiro, M. Browne, Analysis of covariance structures under elliptical distributions,. Amer. Statist. Assoc. 82 (1987) 1092 1097. [31] T. Sniders, R. Bosker, Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling, Sage, Thousand Oaks, CA, 1999. [32] K.-H. Yuan, P.M. Bentler, On asymptotic distributions of normal theory MLE in covariance structure analysis under some nonnormal distributions, Statist. Probab. Lett. 42 (1999) 107 113. [33] K.-H. Yuan, P.M. Bentler, On normal theory and associated test statistics in covariance structure analysis under two classes of nonnormal distributions, Statist. Sinica 9 (1999) 831 853. [34] K.-H. Yuan, P.M. Bentler, On normal theory based inference for multilevel models with distributional violations, Psychometrika 67 (2002) 539 561. [35] K.-H. Yuan, P.M. Bentler, Eight test statistics for multilevel structural equation models, Comput. Statist. Data Anal. 44 (2003) 89 107. [36] K.-H. Yuan, P.M. Bentler, On the asymptotic distributions of two statistics for two-level covariance structure models within the class of elliptical distributions, Psychometrika 69 (2004) 437 457. [37] K.-H. Yuan, P.M. Bentler, Asymptotic robustness of the normal theory likelihood ratio statistic for two-level covariance structure models,. Multivariate Anal. 94 (2005) 328 343. [38] K.-H.Yuan, R.I. ennrich, Asymptotics of estimating equations under natural conditions,. Multivariate Anal. 65 (1998) 245 260.