Multilevel factor analysis modelling using Markov Chain Monte Carlo (MCMC) estimation

Size: px

Start display at page:

Download "Multilevel factor analysis modelling using Markov Chain Monte Carlo (MCMC) estimation"

Alexina Gilbert
5 years ago
Views:

1 Multilevel Factor Analysis Multilevel factor analysis modelling using Markov Chain Monte Carlo (MCMC) estimation Harvey Goldstein William Browne Institute of Education, Univsity of London Abstract A vy genal class of multilevel factor analysis structural equation models is proposed which are dived from considing the concatenation of a sies of building blocks that use sets of factor structures defined within the levels of a multilevel model. An MCMC estimation algorithm is proposed for this structure to produce paramet chains for point intval estimates. A limited simulation excise is presented togeth with an analysis of a data set. Keywords Factor analysis, MCMC, multilevel models, structural equation models Acknowledgements This work was partly carried out with the support of a research grant from the Economic Social Research Council for the development of multilevel models in the Social Sciences. We are vy grateful to Ian Langford, Ken Rowe Ian Plewis for helpful comments. Correspondence: h.goldstein@ioe.ac.uk

2 Multilevel Factor Analysis Introduction Traditional applications of structural equation models have, until recently, ignored complex population data structures. Thus, for example, factor analyses of achievement or ability test scores among students have not adusted for diffences between schools or neighbourhoods. In the case whe a substantial part of int-individual diffences can be accounted for by such groupings, infences which ignore this may be siously misleading. In the extreme case, if all the variation was due to a combination of school neighbourhood effects, a failure to adust to these would lead to the detection of apparent individual level factors which would in fact be non-existent. Recognising this problem, McDonald Goldstein (989) present a multilevel factor analysis, structural equation, model whe individuals are recognised as belonging to groups explicit rom effects for group effects are incorporated. They present an algorithm for maximum likelihood estimation. This model was furth explored by Longford Muthen (99) McDonald (993). Raudenbush (995) applied the EM algorithm to estimation for a -level structural equation model. Rowe Hill (997, 998) show how existing multilevel software can be used to provide approximations to maximum likelihood estimates in genal multilevel structural equation models. In the present pap we extend these models in two ways. First, we show how an MCMC algorithm can be used to fit such models. An important feature of the MCMC approach is that it decomposes the computational algorithm into separate steps, for each of which the is a relatively straightforward estimation procedure. This provides a chain sampled from the full postior distribution of the paramets from which we can calculate unctainty intvals based upon quantiles etc. The second advantage is that the decomposition into separate steps allows us easily to extend the procedure to the estimation of vy genal models, we illustrate how this can be done. A fairly genal -level factor model can be written as follows, using stard factor multilevel model notation: Y =Λ v + u+λ v + e Y = { y }, r =,..., R i =,..., n =,..., J

3 Multilevel Factor Analysis 3 whe the 'uniquenesses' u (level ), e (level ) are mutually independent with covariance matrix Ψ, the are R response measures. The Λ, matrices for the level level factors the are the loading are the, independent, factor vectors at level level. Note that we can have diffent numbs of factors at each level. We adopt the convention of regarding the measurements themselves as constituting the lowest level of the hiarchy so that equation is regarded as a 3-level model. Extensions to more levels are straightforward. v, v Λ Model allows for a factor structure to exist at each level we need to furth specify the factor structure, for example that the factors are orthogonal or pattned with corresponding identifiability constraints. We can impose furth restrictions, for example we may wish to model the uniquenesses in tms of furth explanatory variables. In addition we can add measured covariates to extend to the genal case of a linear structural or path model (see discussion). A simple illustration To illustrate our procedures we shall begin by considing a simple single level model which we write as y = λν + e, r =,..., R, i =,..., N ν ri r i ri i ~ N(0,), i ~ N(0, σ) () This can be viewed as a -level model with a single level rom effect ( ) with variance constrained to R level units for each level unit, each with their own (unique) variance. If we knew the values of the 'loadings' λ r then we could fit () directly as a -level model with the loading vector as the explanatory variable for the level variance which is constrained to be equal to ; if the are any measured covariates in the model their coefficients could also be estimated at the same time. Convsely, if we knew the values of the rom effects v i, we could estimate the loadings; this would now be a single level model with each response variate having its own variance. These considations suggest that an EM algorithm can be used in the estimation whe the v i

4 Multilevel Factor Analysis 4 rom effects are regarded as missing data (see Rubin Thay, 98). In this pap we propose a stochastic MCMC algorithm. MCMC works by simulating new values for each unknown paramet in turn from their respective conditional postior distributions assuming the oth paramets are known. This can be shown to be equivalent (upon convgence) to sampling from the oint postior distribution. MCMC procedures genally incorporate prior information about paramet values so are fully Bayesian procedures. In the present pap we shall assume diffuse prior information although we give algorithms that assume genic prior distributions (see below). Infence is based upon the chain values: conventionally the means of the paramet chains are used as point estimates but medians modes (which will often be close to maximum likelihood estimates) are also available, as we shall illustrate. This procedure has seval advantages. In principle it allows us to provide estimates for complex multilevel factor analysis models with exact infences available. Since the model is an extension of a genal multilevel model we can theoretically extend oth existing multilevel models in a similar way. Thus, for example, we could consid cross-classified structures discrete responses as well as conditioning on measured covariates. Anoth example is the model proposed by Blozis Cudeck (999) whe second level residuals in a repeated measures model are assumed to have a factor structure. In the following section we shall describe our procedure by applying it to the simple example of equation () we will then apply it to more complex examples. A simple implementation of the algorithm The computations have all been carried out in a development vsion of the program MLwiN (Rasbash et al., 000). The essentials of the procedure are described below. We will assume that the factor loadings have Normal prior distributions, * p( λr)~ N( λr, σλr) that the level variance paramets have independent invse Gamma priors, p( σ )~ ( a, b * * Γ ) paramets of the prior distributions.. The * supscript is used to denote the appropriate

5 Multilevel Factor Analysis 5 This model can be updated using a vy simple three step Gibbs sampling algorithm Step : Update (r=,,r) from the following distribution : p λ ) ~ N( ˆ λ, D ) whe λ r ( r r r i i D r = + σ σ λ r ν ν y λ ˆ * i i ri r r = Dr + σ σλr λ Step : Update ν i (i=,,n) from the following distribution : p( ν )~ N( ˆ ν, D ) whe i i i D i λ = + σ r r vˆ i λ r r y = Di σ ri Step 3: Update σ from the following distribution : p( σ ) ~ Γ ( aˆ, bˆ ) whe aˆ = N /+ a * bˆ = e + b * ri i. To study the pformance of the procedure we simulated a small data set from the following model paramets: λ =, Ψ =, N = 0, R = (3) y = λ v + e, (4) ri r i ri The low triangle of the correlation matrix of the responses is

6 Multilevel Factor Analysis All the variables have positively skewed distributions, the chain loading estimates also have highly significant positive skewness kurtosis. The initial starting value for each loading was for each level variance (uniqueness) was 0.. Good starting values will speed up the convgence of the MCMC chains. Table shows the maximum likelihood estimates produced by the AMOS factor analysis package (Arbuckle, 997) togeth with the MCMC results. The factor analysis program carries out a prior stardisation so that the response variates have zo means. In tms of the MCMC algorithm this is equivalent to adding covariates as an 'intcept' tm to (4), one for each response variable; these could be estimated by adding an extra step to the above algorithm. Prior centring of the obsved responses can be carried out to improve convgence. We have summarised the loading estimates by taking both the mean medians of the chain. The mode can also be computed, but in this data set for the variances it is vy poorly estimated we give it only for the loadings. In fact the likelihood surface with respect to the variances is vy flat. The MCMC chains can be summarised using a Normal knel density smoothing method (Silvman 986).

7 Multilevel Factor Analysis 7 Table. Maximum likelihood estimates for simulated data set togeth with MCMC estimates using chain length 50,000 burn in 0. Paramet ML estimate (s.e.) MCMC mean estimates (s.d.) MCMC median estimates MCMC modal estimates λ 0.9 (0.7).03 (0.) λ.4 (0.4).7 (0.5) λ 3.86 (0.57) 3.9 (0.7) λ 4.30 (0.7) 4.8 (0.90) σ 0.5 (0.05) 0.7 (0.07) 0.6 e σ 0.5 (0.09) 0.3 (0.4) 0.8 e σ 0.09 (0.0) 0.0 (0.7) 0.06 e3 σ 0.43 (0.0) 0.55 (0.3) 0.50 e4 The estimates stard rors from the MCMC chain are larg than the maximum likelihood estimates. The stard rors for the latt will genally be undestimates, especially for such a small data set since they use the estimated (plug in) paramet values. The distributions for the variances in particular are skew so that median rath than mean estimates seem prefable. Since we are sampling from the likelihood, the maximum likelihood estimate will be located at the oint paramet mode. We have not computed this but as can be seen from the loading estimates the univariate modes are clos to the maximum likelihood estimates than the means or medians. Table shows good agreement between the variable means the fitted intcept tms. Table. Variable means fitted intcepts variable mean Intcept We have also fitted the structure described by (3) (4) with a simulated data set of 00 cases rath than 0. The results are given in table 3 for the maximum likelihood estimates the means medians of the MCMC procedure.

8 Multilevel Factor Analysis 8 Table 3. Model (3) & (4) with 00 simulated individuals cycles. Paramet ML estimate (s.e.) MCMC mean estimates (s.d.) MCMC median estimates MCMC mode estimates λ 0.95 (0.06) 0.97 (0.06) λ.86 (0.0).89 (0.0) λ.9 (0.5).98 (0.6) λ 3.86 (0.0) 3.94 (0.0) σ 0. (0.03) 0.3 (0.04) e σ 0.7 (0.033) 0.7 (0.033) e σ 0.38 (0.058) 0.38 (0.060) e3 σ 0.39 (0.085) 0.39 (0.087) e4 We see he a clos agreement. The MCMC estimates are slightly high (by up to %) than the maximum likelihood ones, with the modal estimates being closest. In more complex examples we may need to run the chain long with a long burn in also try more than one chain with diffent starting values. For example, a conventional single level factor model could be fitted using stard software to obtain approximations to the level loadings unique variances. Oth procedures Geweke Zhou (996) consid the single level factor model with uncorrelated factors. They use Gibbs sampling consid identifiability constraints. Zhu Lee (999) also consid single level structures including non-linear models that involve factor products pows of factors. They use Gibbs steps for the paramets a Metropolis Hastings algorithm for simulating from the conditional distribution of the factors. They also provide a goodness-of-fit crition (see discussion). It appears, howev, that their algorithm requires individuals to have complete data vectors with no missing responses, wheas the procedure described in the present pap has no such restriction. Scheines et al (999) also use MCMC take as data the sample covariance matrix, for a single level structure, whe covariates are assumed to have been incorporated into the means. They assume a multivariate Normal prior with truncation at zo for the

9 Multilevel Factor Analysis 9 variances. Reection sampling is used to produce the postior distribution. They discuss the problem of identification, point out that identification issues may be resolved by specifying an informative prior. McDonald Goldstein (989) show how maximum likelihood estimates can be obtained for a -level structural equation model. They dive the covariance structure for such a model show how an efficient algorithm can be constructed to obtin maximum likelihood estimates for the multivariate Normal case. Longford Muthen (99) develop this approach. The latt authors, togeth with Goldstein (995, Chapt ) Rowe Hill (997, 998) also point out that consistent estimators can be obtained from a -stage process as follows. A -level multivariate response linear model is fitted using an efficient procedure such as maximum likelihood. This can be accomplished, for example as pointed out earli by defining a 3-level model whe the lowest level is that of the response variables (see Goldstein, 995, Chapt 8 model (5) below). This analysis will produce estimates for the (residual) covariance matrices at each level each of these can then be structured according to an undlying latent variable model in the usual way. By considing the two matrices as two populations we can also impose constraints on, say, the loadings using an algorithm for simultaneously fitting structural equations across seval populations. Rabe-Hesketh et al. (000) consid a genal formulation, similar to model (7) below, but allowing genal link functions, to specify multilevel structural equation genalised linear models (GLLAMM). They consid maximum likelihood estimation using genal maximisation algorithms a set of macros has been written to implement the model in the program STATA. In the MCMC formulation in this pap, it is possible to deal with incomplete data vectors also to use informative prior distributions, as described below. Our algorithm can also be extended to the non-linear factor case using a Metropolis Hastings step when sampling the factor values, as in Zhu Lee (999). Genal multilevel Bayesian factor models Extensions to models with furth factors, pattned loading matrices high levels in the data structure are straightforward. We will consid the -level factor model

10 Multilevel Factor Analysis 0 F G () () = βr + λfr ν f + λgr νgi + r + f = g= y u e u ~ N(0, σ ), e ~ N(0, σ ), ν ~ MVN (0, Ω ), ν ~ MVN (0, Ω ) () r ur f F gi G r =,..., R, i =,..., n, =,..., J, n = N J = He we have R responses for N individuals split between J level units. We have F () sets of factors, defined at level G sets of factors, defined at level. For ν f the fixed part of the model we restrict our algorithm to a single intcept tm β for each response although it is easy to extend the algorithm to arbitrary fixed tms. The residuals at levels, e u r are assumed to be independent. ν gi r Although this allows a vy flexible set of factor models it should be noted that in ord for such models to be identifiable suitable constraints must be put on the paramets. See Evitt (984) for furth discussion of identifiability. These will consist of fixing the values of some of the elements of the factor variance () matrices, Ω Ω /or some of the factor loadings,. The algorithms presented will give steps for all paramets so any paramet that is constrained will simply maintain its chosen value will not be updated. We will initially assume that the factor variance matrices, Ω Ω are known (completely constrained) then discuss how the algorithm can be extended to encompass partially constrained variance matrices. The paramets in the following steps are those available at the current itation of the algorithm. λ fr λ gr Prior Distributions For the algorithm we will assume the following genal priors p β N β σ * ( r)~ ( r, br) p( λ )~ N( λ, σ ), p( λ )~ N( λ, σ () ()* * fr fr fr gr gr gr ) p σ Γ a b p σ Γ a b * * * * ( ur )~ ( ur, ur ), ( )~ (, )

11 Multilevel Factor Analysis As we are assuming that the factor variance matrices are known we can use a Gibbs sampling algorithm which will involve updating paramets in turn by genating new values from the following 8 sets of conditional postior distributions. Step : Update current value of p β ) ~ N( ˆ β, D ) whe ( r r br β r (r=,,r) from the following distribution D br N = + σ σbr ˆ β r d = Dbr + σ β * i βr σbr whe d = β e + β r Step : Update () λ fr (r=,,r, f =,,F whe not constrained) from the following () distribution : λ ) ˆ() () p( ~ N( λ, ) whe fr fr D fr D () fr () n ( ν ) f = + σ σ fr ν () () ()* ˆ() () i f f λ fr fr = D fr + σ σ fr λ whe d () f = e + λ () fr v () f d

12 Multilevel Factor Analysis Step 3: Update λ gr (r=,,r, g =,,G whe not constrained) from the following distribution : λ ) ~ ˆ p( N( λ, ) whe D gr gr = + ( ν ) i gi σ σgr gr D gr ν * ˆ i gi g λgr gr = D gr + σ σgr λ whe d d g = e + λ v gr gi () Step 4: Update ( =,,J) from the following distribution: ν p( ν () ) ~ MVN F ( ˆ ν (), D () ) whe D n λ ( λ ) T () () () r r = + Ω r σ ˆ ν n () () λr d = D r i= σ () () whe d () = e + F f = λ () fr v () f () (), λ = ( λ,... λ ) r r () Fr T, v () = ( v (),..., v () F ) T Step 5: Update ( i=,,n, =,,J) from the following ν i distribution: p( ν ) ~ MVN ˆ ν, D ) whe i G ( i i

13 Multilevel Factor Analysis 3 T ( ) λr ( λr ) D i = r σ + Ω λ r d ˆ ν = Di r σ whe d = e + G g= λ v gr g, λ = ( λ,... λ ) r r Gr T, v i = ( v i,..., v Gi ) T u r Step 6: Update (r=,,r, =,,J) from the following distribution : p( u r ) ~ N( uˆ r, D (u) r ) whe D ( u) r n = + σ σur uˆ r D = d ( u) n r σ i= ( u) whe d = e + u ( u) r Step 7: Update σ from the following distribution : p( σ ) ~ Γ ( aˆ, bˆ ) whe ur ur ur ur aˆ = J /+ a ur * ur * bˆur = ur + b ur. Step 8: Update σ from the following distribution : p( σ ) ~ Γ ( aˆ, bˆ ) whe aˆ = N /+ a * bˆ = e + b. * i

14 Multilevel Factor Analysis 4 Note that the level residuals, e can be calculated by subtraction at evy step of the algorithm. Unconstrained Factor Variance Matrices In the genal algorithm we have assumed that the factor variances are all constrained. Typically we will fix the variances to equal the covariances to equal 0 have independent factors. This form will allow us to simplify steps 4 5 of the algorithm to univariate Normal updates for each factor separately. We may howev wish to consid correlations between the factors. He we will modify our algorithm to allow anoth special case whe the variances are constrained to be but the covariances can be freely estimated. Whe the resulting correlations obtained are estimated to be close to or then we may be fitting too many factors at that particular level. As the variances are constrained to equal the covariances between factors equal the correlations between the factors. This means that each covariance is constrained to lie between. We will consid he only the factor variance matrix at level as the step for the level variance matrix simply involves changing subscripts. We will use the following priors: p( Ω )~ Uniform(,) l m, lm He Ω,lm is the l,m-th element of the level factor variance matrix. We will update these covariance paramets using a Metropolis step a Normal rom walk proposal (see Browne Rasbash (in preparation) for more details on using Metropolis Hastings methods for constrained variance matrices). * ( t ) Step 9 : At itation t genate ~ N( Ω σ ) whe is a proposal Ω, lm, lm, plm * * distribution variance that has to be set for each covariance. Then if Ω > or < - ( t) ( t ) set Ω = Ω as the proposed covariance is not valid else form a proposed new, lm * ( ) matrix by replacing the l,m th element of Ω t by this proposed value. We then set Ω, lm σ plm, lm Ω, lm

15 Multilevel Factor Analysis 5 Ω Ω ( t), lm ( t), lm * * () ( t ) () = Ω with probability min(, p( Ω ν ) / p( Ω ν ), lm ( t ) = Ω othwise., lm f f He p( Ω ν ) = Ω exp(( ν ) ( Ω * () * / () T * () f f f ) ν ) ( t ) () p( Ω ν f ) = Ω exp(( ν ) ( Ω ( t ) / () T ( t ) () f f ) ν ) This procedure is repeated for each covariance that is not constrained. Missing Data The exam example that is discussed in this pap has the additional difficulty that individuals have diffent numbs of responses. This is not a problem for the MCMC methods if we are prepared to assume missingness is at rom or effectively so by design. This is equivalent to giving the missing data a uniform prior. We then have to simply add an extra Gibbs sampling step to the algorithm to sample the missing values at each itation. As an illustration we will consid an individual who is missing response r. In a factor model the correlation between responses is explained in the factor tms conditional on these tms the responses for an individual are independent so the conditional distributions of the missing responses have simple forms. Step 0: Update y (r=,,r, i=,,n, =,,J that are missing) from the y following distribution, given the current values, y ~ N( yˆ, σ) whe ŷ = β F G () () ( ) r + λ fr ν f + λgrν gi + ur f = g=. Example The example uses a data set discussed by Goldstein (995, Chapt 4) consists of a set of responses to a sies of 4 test booklets by 439 pupils in 99 schools. Each student responded to a core booklet containing Earth science, biology physics items to a furth two booklets romly chosen from three available. Two of these booklets we in biology one in physics. As a result the are 6 possible scores, one in earth

16 Multilevel Factor Analysis 6 science, three in biology in physics, each student having up to five. A full description of the data is given in Goldstein (995). A multivariate -level model fitted to the data gives the following maximum likelihood estimates for the means covariance/correlation matrices in Table 4. The model can be written as follows y = β x + γ x z + u x + v x ik i hk i hk k ik hk ik hk h= h= h= h= u v u v u3 v3 ~ N(0, Ωu) ~ N(0, Ωv) u4 v4 u 5 v 5 u v 6 6 x = if h = i, 0 othwise z hk k = if a girl, = 0 if a boy i indexes response variables, indexes students, k indexes schools (5)

17 Multilevel Factor Analysis 7 Table 4. Science attainment estimates. Fixed Estimate (s.e.) Earth Science Core (0.0076) Biology Core 0.7 (0.000) Biology R (0.009) Biology R (0.067) Physics Core 0.75 (0.08) Physics R (0.08) Earth Science Core (girls - boys) (0.0059) Biology Core (girls - boys) (0.0066) Biology R3 (girls - boys) (0.05) Biology R4 (girls - boys) (0.037) Physics Core (girls - boys) (0.0073) Physics R (girls - boys) (0.06) Rom. Variances on diagonal; correlations off-diagonal Level (School) E.Sc. core Biol. Core Biol R3 Biol R4 Phys. core Phys. R E.Sc. core Biol. core Biol R Biol R Phys. core Phys. R Level (Student) E.Sc. core Biol. Core Biol R3 Biol R4 Phys. core Phys. R E.Sc. core Biol. core Biol R Biol R Phys. core Phys. R We now fit two level factor models to these data, shown in Table 5. We omit the fixed effects in Table 5 since they are vy close to those in Table 4. Model A has two factors at level a single factor at level. For illustration we have constrained all the variances to be.0 allowed the covariance (correlation) between the level factors to be estimated. Inspection of the correlation structure suggests a model whe the first factor at level estimates the loadings for Earth Science Biology, constraining those for Physics to be zo (the physics responses have the highest correlation), for the second factor at level to allow only the loadings for Physics to be unconstrained. The high correlation of 0.90 between the factors suggests that phaps a single factor will be an adequate summary. Although we do not present results, we have also studied

18 Multilevel Factor Analysis 8 a similar structure for two factors at the school level whe the correlation is estimated to be 0.97, strongly suggesting a single factor at that level. For model B we have separated the three topics of Earth Science, Biology Physics to separately have non-zo loadings on three corresponding factors at the student level. This time the high int-correlation is that between the Biology Physics booklets with only modate (0.49, 0.55) correlations between Earth Science Biology Physics. This suggests that we need at least two factors to describe the student level data that our preliminary analysis suggesting ust one factor can be improved. Since our analyses are for illustrative purposes only we have not pursued furth possibilities with these data.

19 Multilevel Factor Analysis 9 Table 5. Science attainment MCMC factor model estimates. Paramet A Estimate (s.e.) B Estimate (s.e.) Level ; factor loadings E.Sc. core 0.06 (0.004) 0. (0.0) Biol. core 0. (0.004) 0* Biol R (0.008) 0* Biol R4 0. (0.009) 0* Phys. core 0* 0* Phys. R 0* 0* Level ; factor loadings E.Sc. core 0* 0* Biol. core 0* 0.0 (0.005) Biol R3 0* 0.05 (0.008) Biol R4 0* 0.0 (0.009) Phys. core 0. (0.005) 0* Phys. R 0. (0.007) 0* Level ; factor 3 loadings E.Sc. core - 0* Biol. core - 0* Biol R3-0* Biol R4-0* Phys. core - 0. (0.005) Phys. R - 0. (0.007) Level ; factor loadings E.Sc. core 0.04 (0.007) 0.04 (0.007) Biol. core 0.09 (0.008) 0.09 (0.008) Biol R (0.009) 0.05 (0.00) Biol R4 0.0 (0.06) 0.0 (0.06) Phys. core 0.0 (0.00) 0.0 (0.00) Phys. R 0.09 (0.0) 0.09 (0.0) Level residual variances E.Sc. core 0.07 (0.00) (0.004) Biol. core 0.05 (0.00) 0.05 (0.00) Biol R (0.00) (0.00) Biol R (0.00) (0.00) Phys. core 0.06 (0.00) 0.06 (0.00) Phys. R 0.09 (0.00) (0.00) Level residual variances E.Sc. core 0.00 (0.0005) 0.00 (0.0005) Biol. core (0.0003) (0.0003) Biol R (0.0008) 0.00 (0.0008) Biol R (0.00) 0.00 (0.00) Phys. core 0.00 (0.0005) 0.00 (0.0005) Phys. R (0.0009) (0.0009) Level correlation factors & 0.90 (0.03) 0.55 (0.0) Level correlation factors & (0.09) Level correlation factors &3-0.9 (0.04) * indicates constrained paramet. A chain of length 0,000 with a burn in of 000 was used. Level is student, level is school. Discussion This pap has shown how factor models can be specified fitted. The MCMC computations allow point intval estimation with an advantage ov maximum

20 Multilevel Factor Analysis 0 likelihood estimation in that full account is taken of the unctainty associated with the estimates. In addition it allows full Bayesian modelling with informative prior distributions which may be especially useful for identification problems. As pointed out in the introduction, the MCMC algorithm is readily extended to hle the genal structural equation case, furth work is being carried out along the following lines. For simplicity we consid the single level model case to illustrate the procedure. A fairly genal, single level, structural equation model can be written in the following matrix form (see McDonald, 985 for some altnative representations) Av = Av + W Y =Λ v + U Y =Λ v + U (6) Whe Y, Y are obsved multivariate vectors of responses, is a known transformation matrix, often set to the identity matrix, is a coefficient matrix which specifies a multivariate linear model between the set of transformed factors, v, v, Λ, Λ WU,, U are loadings, U, U are uniquenesses, W is a rom residual vector are mutually independent with zo means. The extension of this model to the multilevel case follows that of the factor model we shall restrict ourselves to sketching how the MCMC algorithm can be applied to (6). Note, that as before we can add covariates measured variables multiplying the latent variable tms as shown in * (6). Note that we can write as the vector by stacking the rows of. For example if A A A A A a0 a0 a a * A =, then A = a a3 a a 3 The distributional form of the model can be written as Av ~ MVN( Av, Σ ) 3 v ~ MVN(0, Σ ), v ~ MVN(0, Σ ) v v Y ~ MVN( Λv, Σ), Y ~ MVN( Λv, Σ) with priors

21 Multilevel Factor Analysis A ~ MVN( Aˆ, Σ ), Λ ~ MVN( Λˆ, Σ ), Λ ~ MVN( Λˆ, Σ * * * A Λ Λ ) Σ, Σ, Σ 3 having invse Wishart priors. The coefficient loading matrices have conditional Normal distributions as do the factor values. The covariance matrices uniqueness variance matrices involve steps similar to those given in the earli algorithm. The extension to two levels more follows the same genal procedure as we have shown earli. The model can be genalised furth by considing m sets of response variables, Y, Y,... Y m in (6) seval, linked, multiple group structural relationships with the k- th relationship having the genal form h k k k k V A = V A + W ( ) ( ) ( ) ( ) ( k) h h g g g the above procedure can be extended for this case. We note that the model for simultaneous factor analysis (or, more genally, structural equation model) in seval populations is a special case of this model, with the addition of any required constraints on paramet values across populations. We can also genalise to include fixed effects, responses at level covariates Z h for the factors, which may be a subset of the fixed effects covariates X Y = Xβ +Λ v Z + u +Λ v Z + e Y =Λ v Z + u () () () () Y = { y }, Y = { y } () r =,..., R i =,..., i =,..., J r (7) The supscript refs to the level at which the measurement exists, so that, for example, y, y i ref respectively to the first measurement in the i-th level unit in the -th level unit (say students schools) the second measurement taken at school level for the -th school. Furth work is currently being carried out on applying these procedures to non-linear models specifically to genalised linear models. For simplicity consid the binomial response logistic model as illustration. Write

22 Multilevel Factor Analysis E( y ) = π = [+ exp ( a + λv )] i i i i i y ~Bin( π, n ) i i (8) The simplest model is the multiple binary response model ( n i = ) that is refred to in the psychometric litature as a unidimensional item response model (Goldstein & Wood, 989, Bartholomew Knott, 999). Estimation for this model is not possible using a simple Gibbs sampling algorithm but as in the stard binomial multilevel case (see Browne, 998) we could replace any Gibbs steps that do not have stard conditional postior distributions with Metropolis Hastings steps. The issues that surround the specification intpretation of single level factor structural equation models are also present in our multilevel vsions. Paramet identification has already been discussed; anoth issue is the boundary Heywood case. We have obsved such solutions occurring whe sets of loading paramets tend towards zo or a correlation tends towards.0. A final important issue that only affects stochastic procedures is the problem of flipping states. This means that the is not a unique solution even in a -factor problem as the loadings factor values may all flip their sign to give an equivalent solution. When the numb of factors increases the are great problems as factors may swap ov as the chains progress. This means that identifiability is an even great considation when using stochastic techniques. For making infences about individual paramets or functions of paramets we can use the chain values to provide point intval estimates. These can also be used to provide large sample Wald tests for sets of paramets. Zhu Lee propose a chisquare discrepancy function for evaluating the postior predictive p-value, which is the Bayesian countpart of the frequentist p-value statistic (Meng, 994). In the multilevel case the α level probability becomes J i () i () i ( ) ( ) ( ) (, ˆ B = χα i θ ) i= i= (9) pˆ Y i i p D Y v DY (, vˆ ) Y Y () i () i T i θ = i Σi i whe Y is the vector of responses for the i-th level unit Σ is the (non-diagonal) i residual covariance matrix. i

23 Multilevel Factor Analysis 3 Refences Arbuckle, J.L. (997). AMOS: Vsion 3.6. Chicago: Small Wats Corporation. Bartholomew, D.J. Knott, M. (999). Latent variable models factor analysis. ( nd edition). London, Arnold. Browne, W. (998). Applying MCMC methods to multilevel models. PhD thesis, Univsity of Bath. Browne, W. Rasbash, J. (00). MCMC algorithms for variance matrices with applications in multilevel modeling. (in preparation) Blozis, S. A. Cudeck, R. (999). Conditionally linear mixed-effects models with latent variable covariates. Journal of Educational Behavioural Statistics 4: Clayton, D. Rasbash, J. (999). Estimation in large crossed rom effect models by data augmentation. Journal of the Royal Statistical Society, A. 6: Evitt, B. S. (984). An introduction to latent variable models. London, Chapman Hall. Geweke, J. Zhou, G. (996). Measuring the pricing ror of the arbitrage pricing theory. The review of intnational financial studies 9: Goldstein, H. (995). Multilevel statistical models. London: Edward Arnold. Goldstein, H., & McDonald, R.P. (988). A genal model for the analysis of multilevel data. Psychometrika, 53 (4), Goldstein, H., & Rasbash, J. (996). Improved approximations for multilevel models with binary responses. Journal of the Royal Statistical Society, A. 59, Goldstein, H., Rasbash, J., Plewis, I., Drap, D., Browne, W., Yang, M., Woodhouse, G., & Healy, M. (998). A us s guide to MLwiN. Multilevel Models Proect, Institute of Education Univsity of London. Goldstein, H., & Wood, R. (989). Five decades of item response modelling. British Journal of Mathematical Statistical Psychology, 4,

24 Multilevel Factor Analysis 4 Lindsey, J. K. (999). Relationships among sample size, model selection likelihood regions, scientifically important diffences. Journal of the Royal Statistical Society, D 48: Longford, N., & Muthen, B. O. (99). Factor analysis for clusted obsvations. Psychometrika, 57, Mcdonald, R. P. (985). Factor analysis related methods. Hillsdale, NJ, Lawrence Earlbaum. McDonald, R. P. (993). A genal model for two-level data with responses missing at rom. Psychometrika, 58, McDonald, R.P., & Goldstein, H. (989). Balanced vsus unbalanced designs for linear structural relations in two-level data. British Journal of Mathematical Statistical Psychology, 4, 5-3. Meng, X. L. (994). Postior predictive p-values. Annals of Statistics : Rabe-hesketh, S., Pickles, A. Taylor, C. (000). Sg9: Genalized linear latent mixed models. Stata technical bulletin 53, Rasbash, J., Browne, W., Goldstein, H., Yang, M., et al. (000). A us's guide to MlwiN (Second Edition). London, Institute of Education: Raudenbush, S.W. (995). Maximum Likelihood estimation for unbalanced multilevel covariance structure models via the EM algorithm. British Journal of Mathematical Statistical Psychology, 48, Rowe, K.J., & Hill, P.W. (997). Simultaneous estimation of multilevel structural equations to model students' educational progress. Pap presented at the Tenth Intnational Congress for School effectiveness School Improvement, Memphis, Tennessee. Rowe, K.J., & Hill, P.W. (998). Modelling educational effectiveness in classrooms: The use of multilevel structural equations to model students progress. Educational Research Evaluation, 4 (to appear). Rubin, D.B., & Thay, D. T. (98). EM algorithms for ML factor analysis. Psychometrika, 47, Scheines, R., Hoitink, H. Boomsma, A. (999). Bayesian estimation testing of structural equation models. Psychometrika 64: 37-5.

25 Multilevel Factor Analysis 5 Silvman, B.W. (986). Density Estimation for Statistics Data analysis. London: Chapman Hall. Zhu, H.-T. Lee, S.-Y. (999). Statistical analysis of nonlinear factor analysis models. British Journal of Mathematical Statistical Psychology 5: 5-4.

Partitioning variation in multilevel models.

Partitioning variation in multilevel models. by Harvey Goldstein, William Browne and Jon Rasbash Institute of Education, London, UK. Summary. In multilevel modelling, the residual variation in a response