Multilevel factor analysis modelling using Markov Chain Monte Carlo (MCMC) estimation

Size: px
Start display at page:

Download "Multilevel factor analysis modelling using Markov Chain Monte Carlo (MCMC) estimation"

Transcription

1 Multilevel Factor Analysis Multilevel factor analysis modelling using Markov Chain Monte Carlo (MCMC) estimation Harvey Goldstein William Browne Institute of Education, Univsity of London Abstract A vy genal class of multilevel factor analysis structural equation models is proposed which are dived from considing the concatenation of a sies of building blocks that use sets of factor structures defined within the levels of a multilevel model. An MCMC estimation algorithm is proposed for this structure to produce paramet chains for point intval estimates. A limited simulation excise is presented togeth with an analysis of a data set. Keywords Factor analysis, MCMC, multilevel models, structural equation models Acknowledgements This work was partly carried out with the support of a research grant from the Economic Social Research Council for the development of multilevel models in the Social Sciences. We are vy grateful to Ian Langford, Ken Rowe Ian Plewis for helpful comments. Correspondence: h.goldstein@ioe.ac.uk

2 Multilevel Factor Analysis Introduction Traditional applications of structural equation models have, until recently, ignored complex population data structures. Thus, for example, factor analyses of achievement or ability test scores among students have not adusted for diffences between schools or neighbourhoods. In the case whe a substantial part of int-individual diffences can be accounted for by such groupings, infences which ignore this may be siously misleading. In the extreme case, if all the variation was due to a combination of school neighbourhood effects, a failure to adust to these would lead to the detection of apparent individual level factors which would in fact be non-existent. Recognising this problem, McDonald Goldstein (989) present a multilevel factor analysis, structural equation, model whe individuals are recognised as belonging to groups explicit rom effects for group effects are incorporated. They present an algorithm for maximum likelihood estimation. This model was furth explored by Longford Muthen (99) McDonald (993). Raudenbush (995) applied the EM algorithm to estimation for a -level structural equation model. Rowe Hill (997, 998) show how existing multilevel software can be used to provide approximations to maximum likelihood estimates in genal multilevel structural equation models. In the present pap we extend these models in two ways. First, we show how an MCMC algorithm can be used to fit such models. An important feature of the MCMC approach is that it decomposes the computational algorithm into separate steps, for each of which the is a relatively straightforward estimation procedure. This provides a chain sampled from the full postior distribution of the paramets from which we can calculate unctainty intvals based upon quantiles etc. The second advantage is that the decomposition into separate steps allows us easily to extend the procedure to the estimation of vy genal models, we illustrate how this can be done. A fairly genal -level factor model can be written as follows, using stard factor multilevel model notation: Y =Λ v + u+λ v + e Y = { y }, r =,..., R i =,..., n =,..., J

3 Multilevel Factor Analysis 3 whe the 'uniquenesses' u (level ), e (level ) are mutually independent with covariance matrix Ψ, the are R response measures. The Λ, matrices for the level level factors the are the loading are the, independent, factor vectors at level level. Note that we can have diffent numbs of factors at each level. We adopt the convention of regarding the measurements themselves as constituting the lowest level of the hiarchy so that equation is regarded as a 3-level model. Extensions to more levels are straightforward. v, v Λ Model allows for a factor structure to exist at each level we need to furth specify the factor structure, for example that the factors are orthogonal or pattned with corresponding identifiability constraints. We can impose furth restrictions, for example we may wish to model the uniquenesses in tms of furth explanatory variables. In addition we can add measured covariates to extend to the genal case of a linear structural or path model (see discussion). A simple illustration To illustrate our procedures we shall begin by considing a simple single level model which we write as y = λν + e, r =,..., R, i =,..., N ν ri r i ri i ~ N(0,), i ~ N(0, σ) () This can be viewed as a -level model with a single level rom effect ( ) with variance constrained to R level units for each level unit, each with their own (unique) variance. If we knew the values of the 'loadings' λ r then we could fit () directly as a -level model with the loading vector as the explanatory variable for the level variance which is constrained to be equal to ; if the are any measured covariates in the model their coefficients could also be estimated at the same time. Convsely, if we knew the values of the rom effects v i, we could estimate the loadings; this would now be a single level model with each response variate having its own variance. These considations suggest that an EM algorithm can be used in the estimation whe the v i

4 Multilevel Factor Analysis 4 rom effects are regarded as missing data (see Rubin Thay, 98). In this pap we propose a stochastic MCMC algorithm. MCMC works by simulating new values for each unknown paramet in turn from their respective conditional postior distributions assuming the oth paramets are known. This can be shown to be equivalent (upon convgence) to sampling from the oint postior distribution. MCMC procedures genally incorporate prior information about paramet values so are fully Bayesian procedures. In the present pap we shall assume diffuse prior information although we give algorithms that assume genic prior distributions (see below). Infence is based upon the chain values: conventionally the means of the paramet chains are used as point estimates but medians modes (which will often be close to maximum likelihood estimates) are also available, as we shall illustrate. This procedure has seval advantages. In principle it allows us to provide estimates for complex multilevel factor analysis models with exact infences available. Since the model is an extension of a genal multilevel model we can theoretically extend oth existing multilevel models in a similar way. Thus, for example, we could consid cross-classified structures discrete responses as well as conditioning on measured covariates. Anoth example is the model proposed by Blozis Cudeck (999) whe second level residuals in a repeated measures model are assumed to have a factor structure. In the following section we shall describe our procedure by applying it to the simple example of equation () we will then apply it to more complex examples. A simple implementation of the algorithm The computations have all been carried out in a development vsion of the program MLwiN (Rasbash et al., 000). The essentials of the procedure are described below. We will assume that the factor loadings have Normal prior distributions, * p( λr)~ N( λr, σλr) that the level variance paramets have independent invse Gamma priors, p( σ )~ ( a, b * * Γ ) paramets of the prior distributions.. The * supscript is used to denote the appropriate

5 Multilevel Factor Analysis 5 This model can be updated using a vy simple three step Gibbs sampling algorithm Step : Update (r=,,r) from the following distribution : p λ ) ~ N( ˆ λ, D ) whe λ r ( r r r i i D r = + σ σ λ r ν ν y λ ˆ * i i ri r r = Dr + σ σλr λ Step : Update ν i (i=,,n) from the following distribution : p( ν )~ N( ˆ ν, D ) whe i i i D i λ = + σ r r vˆ i λ r r y = Di σ ri Step 3: Update σ from the following distribution : p( σ ) ~ Γ ( aˆ, bˆ ) whe aˆ = N /+ a * bˆ = e + b * ri i. To study the pformance of the procedure we simulated a small data set from the following model paramets: λ =, Ψ =, N = 0, R = (3) y = λ v + e, (4) ri r i ri The low triangle of the correlation matrix of the responses is

6 Multilevel Factor Analysis All the variables have positively skewed distributions, the chain loading estimates also have highly significant positive skewness kurtosis. The initial starting value for each loading was for each level variance (uniqueness) was 0.. Good starting values will speed up the convgence of the MCMC chains. Table shows the maximum likelihood estimates produced by the AMOS factor analysis package (Arbuckle, 997) togeth with the MCMC results. The factor analysis program carries out a prior stardisation so that the response variates have zo means. In tms of the MCMC algorithm this is equivalent to adding covariates as an 'intcept' tm to (4), one for each response variable; these could be estimated by adding an extra step to the above algorithm. Prior centring of the obsved responses can be carried out to improve convgence. We have summarised the loading estimates by taking both the mean medians of the chain. The mode can also be computed, but in this data set for the variances it is vy poorly estimated we give it only for the loadings. In fact the likelihood surface with respect to the variances is vy flat. The MCMC chains can be summarised using a Normal knel density smoothing method (Silvman 986).

7 Multilevel Factor Analysis 7 Table. Maximum likelihood estimates for simulated data set togeth with MCMC estimates using chain length 50,000 burn in 0. Paramet ML estimate (s.e.) MCMC mean estimates (s.d.) MCMC median estimates MCMC modal estimates λ 0.9 (0.7).03 (0.) λ.4 (0.4).7 (0.5) λ 3.86 (0.57) 3.9 (0.7) λ 4.30 (0.7) 4.8 (0.90) σ 0.5 (0.05) 0.7 (0.07) 0.6 e σ 0.5 (0.09) 0.3 (0.4) 0.8 e σ 0.09 (0.0) 0.0 (0.7) 0.06 e3 σ 0.43 (0.0) 0.55 (0.3) 0.50 e4 The estimates stard rors from the MCMC chain are larg than the maximum likelihood estimates. The stard rors for the latt will genally be undestimates, especially for such a small data set since they use the estimated (plug in) paramet values. The distributions for the variances in particular are skew so that median rath than mean estimates seem prefable. Since we are sampling from the likelihood, the maximum likelihood estimate will be located at the oint paramet mode. We have not computed this but as can be seen from the loading estimates the univariate modes are clos to the maximum likelihood estimates than the means or medians. Table shows good agreement between the variable means the fitted intcept tms. Table. Variable means fitted intcepts variable mean Intcept We have also fitted the structure described by (3) (4) with a simulated data set of 00 cases rath than 0. The results are given in table 3 for the maximum likelihood estimates the means medians of the MCMC procedure.

8 Multilevel Factor Analysis 8 Table 3. Model (3) & (4) with 00 simulated individuals cycles. Paramet ML estimate (s.e.) MCMC mean estimates (s.d.) MCMC median estimates MCMC mode estimates λ 0.95 (0.06) 0.97 (0.06) λ.86 (0.0).89 (0.0) λ.9 (0.5).98 (0.6) λ 3.86 (0.0) 3.94 (0.0) σ 0. (0.03) 0.3 (0.04) e σ 0.7 (0.033) 0.7 (0.033) e σ 0.38 (0.058) 0.38 (0.060) e3 σ 0.39 (0.085) 0.39 (0.087) e4 We see he a clos agreement. The MCMC estimates are slightly high (by up to %) than the maximum likelihood ones, with the modal estimates being closest. In more complex examples we may need to run the chain long with a long burn in also try more than one chain with diffent starting values. For example, a conventional single level factor model could be fitted using stard software to obtain approximations to the level loadings unique variances. Oth procedures Geweke Zhou (996) consid the single level factor model with uncorrelated factors. They use Gibbs sampling consid identifiability constraints. Zhu Lee (999) also consid single level structures including non-linear models that involve factor products pows of factors. They use Gibbs steps for the paramets a Metropolis Hastings algorithm for simulating from the conditional distribution of the factors. They also provide a goodness-of-fit crition (see discussion). It appears, howev, that their algorithm requires individuals to have complete data vectors with no missing responses, wheas the procedure described in the present pap has no such restriction. Scheines et al (999) also use MCMC take as data the sample covariance matrix, for a single level structure, whe covariates are assumed to have been incorporated into the means. They assume a multivariate Normal prior with truncation at zo for the

9 Multilevel Factor Analysis 9 variances. Reection sampling is used to produce the postior distribution. They discuss the problem of identification, point out that identification issues may be resolved by specifying an informative prior. McDonald Goldstein (989) show how maximum likelihood estimates can be obtained for a -level structural equation model. They dive the covariance structure for such a model show how an efficient algorithm can be constructed to obtin maximum likelihood estimates for the multivariate Normal case. Longford Muthen (99) develop this approach. The latt authors, togeth with Goldstein (995, Chapt ) Rowe Hill (997, 998) also point out that consistent estimators can be obtained from a -stage process as follows. A -level multivariate response linear model is fitted using an efficient procedure such as maximum likelihood. This can be accomplished, for example as pointed out earli by defining a 3-level model whe the lowest level is that of the response variables (see Goldstein, 995, Chapt 8 model (5) below). This analysis will produce estimates for the (residual) covariance matrices at each level each of these can then be structured according to an undlying latent variable model in the usual way. By considing the two matrices as two populations we can also impose constraints on, say, the loadings using an algorithm for simultaneously fitting structural equations across seval populations. Rabe-Hesketh et al. (000) consid a genal formulation, similar to model (7) below, but allowing genal link functions, to specify multilevel structural equation genalised linear models (GLLAMM). They consid maximum likelihood estimation using genal maximisation algorithms a set of macros has been written to implement the model in the program STATA. In the MCMC formulation in this pap, it is possible to deal with incomplete data vectors also to use informative prior distributions, as described below. Our algorithm can also be extended to the non-linear factor case using a Metropolis Hastings step when sampling the factor values, as in Zhu Lee (999). Genal multilevel Bayesian factor models Extensions to models with furth factors, pattned loading matrices high levels in the data structure are straightforward. We will consid the -level factor model

10 Multilevel Factor Analysis 0 F G () () = βr + λfr ν f + λgr νgi + r + f = g= y u e u ~ N(0, σ ), e ~ N(0, σ ), ν ~ MVN (0, Ω ), ν ~ MVN (0, Ω ) () r ur f F gi G r =,..., R, i =,..., n, =,..., J, n = N J = He we have R responses for N individuals split between J level units. We have F () sets of factors, defined at level G sets of factors, defined at level. For ν f the fixed part of the model we restrict our algorithm to a single intcept tm β for each response although it is easy to extend the algorithm to arbitrary fixed tms. The residuals at levels, e u r are assumed to be independent. ν gi r Although this allows a vy flexible set of factor models it should be noted that in ord for such models to be identifiable suitable constraints must be put on the paramets. See Evitt (984) for furth discussion of identifiability. These will consist of fixing the values of some of the elements of the factor variance () matrices, Ω Ω /or some of the factor loadings,. The algorithms presented will give steps for all paramets so any paramet that is constrained will simply maintain its chosen value will not be updated. We will initially assume that the factor variance matrices, Ω Ω are known (completely constrained) then discuss how the algorithm can be extended to encompass partially constrained variance matrices. The paramets in the following steps are those available at the current itation of the algorithm. λ fr λ gr Prior Distributions For the algorithm we will assume the following genal priors p β N β σ * ( r)~ ( r, br) p( λ )~ N( λ, σ ), p( λ )~ N( λ, σ () ()* * fr fr fr gr gr gr ) p σ Γ a b p σ Γ a b * * * * ( ur )~ ( ur, ur ), ( )~ (, )

11 Multilevel Factor Analysis As we are assuming that the factor variance matrices are known we can use a Gibbs sampling algorithm which will involve updating paramets in turn by genating new values from the following 8 sets of conditional postior distributions. Step : Update current value of p β ) ~ N( ˆ β, D ) whe ( r r br β r (r=,,r) from the following distribution D br N = + σ σbr ˆ β r d = Dbr + σ β * i βr σbr whe d = β e + β r Step : Update () λ fr (r=,,r, f =,,F whe not constrained) from the following () distribution : λ ) ˆ() () p( ~ N( λ, ) whe fr fr D fr D () fr () n ( ν ) f = + σ σ fr ν () () ()* ˆ() () i f f λ fr fr = D fr + σ σ fr λ whe d () f = e + λ () fr v () f d

12 Multilevel Factor Analysis Step 3: Update λ gr (r=,,r, g =,,G whe not constrained) from the following distribution : λ ) ~ ˆ p( N( λ, ) whe D gr gr = + ( ν ) i gi σ σgr gr D gr ν * ˆ i gi g λgr gr = D gr + σ σgr λ whe d d g = e + λ v gr gi () Step 4: Update ( =,,J) from the following distribution: ν p( ν () ) ~ MVN F ( ˆ ν (), D () ) whe D n λ ( λ ) T () () () r r = + Ω r σ ˆ ν n () () λr d = D r i= σ () () whe d () = e + F f = λ () fr v () f () (), λ = ( λ,... λ ) r r () Fr T, v () = ( v (),..., v () F ) T Step 5: Update ( i=,,n, =,,J) from the following ν i distribution: p( ν ) ~ MVN ˆ ν, D ) whe i G ( i i

13 Multilevel Factor Analysis 3 T ( ) λr ( λr ) D i = r σ + Ω λ r d ˆ ν = Di r σ whe d = e + G g= λ v gr g, λ = ( λ,... λ ) r r Gr T, v i = ( v i,..., v Gi ) T u r Step 6: Update (r=,,r, =,,J) from the following distribution : p( u r ) ~ N( uˆ r, D (u) r ) whe D ( u) r n = + σ σur uˆ r D = d ( u) n r σ i= ( u) whe d = e + u ( u) r Step 7: Update σ from the following distribution : p( σ ) ~ Γ ( aˆ, bˆ ) whe ur ur ur ur aˆ = J /+ a ur * ur * bˆur = ur + b ur. Step 8: Update σ from the following distribution : p( σ ) ~ Γ ( aˆ, bˆ ) whe aˆ = N /+ a * bˆ = e + b. * i

14 Multilevel Factor Analysis 4 Note that the level residuals, e can be calculated by subtraction at evy step of the algorithm. Unconstrained Factor Variance Matrices In the genal algorithm we have assumed that the factor variances are all constrained. Typically we will fix the variances to equal the covariances to equal 0 have independent factors. This form will allow us to simplify steps 4 5 of the algorithm to univariate Normal updates for each factor separately. We may howev wish to consid correlations between the factors. He we will modify our algorithm to allow anoth special case whe the variances are constrained to be but the covariances can be freely estimated. Whe the resulting correlations obtained are estimated to be close to or then we may be fitting too many factors at that particular level. As the variances are constrained to equal the covariances between factors equal the correlations between the factors. This means that each covariance is constrained to lie between. We will consid he only the factor variance matrix at level as the step for the level variance matrix simply involves changing subscripts. We will use the following priors: p( Ω )~ Uniform(,) l m, lm He Ω,lm is the l,m-th element of the level factor variance matrix. We will update these covariance paramets using a Metropolis step a Normal rom walk proposal (see Browne Rasbash (in preparation) for more details on using Metropolis Hastings methods for constrained variance matrices). * ( t ) Step 9 : At itation t genate ~ N( Ω σ ) whe is a proposal Ω, lm, lm, plm * * distribution variance that has to be set for each covariance. Then if Ω > or < - ( t) ( t ) set Ω = Ω as the proposed covariance is not valid else form a proposed new, lm * ( ) matrix by replacing the l,m th element of Ω t by this proposed value. We then set Ω, lm σ plm, lm Ω, lm

15 Multilevel Factor Analysis 5 Ω Ω ( t), lm ( t), lm * * () ( t ) () = Ω with probability min(, p( Ω ν ) / p( Ω ν ), lm ( t ) = Ω othwise., lm f f He p( Ω ν ) = Ω exp(( ν ) ( Ω * () * / () T * () f f f ) ν ) ( t ) () p( Ω ν f ) = Ω exp(( ν ) ( Ω ( t ) / () T ( t ) () f f ) ν ) This procedure is repeated for each covariance that is not constrained. Missing Data The exam example that is discussed in this pap has the additional difficulty that individuals have diffent numbs of responses. This is not a problem for the MCMC methods if we are prepared to assume missingness is at rom or effectively so by design. This is equivalent to giving the missing data a uniform prior. We then have to simply add an extra Gibbs sampling step to the algorithm to sample the missing values at each itation. As an illustration we will consid an individual who is missing response r. In a factor model the correlation between responses is explained in the factor tms conditional on these tms the responses for an individual are independent so the conditional distributions of the missing responses have simple forms. Step 0: Update y (r=,,r, i=,,n, =,,J that are missing) from the y following distribution, given the current values, y ~ N( yˆ, σ) whe ŷ = β F G () () ( ) r + λ fr ν f + λgrν gi + ur f = g=. Example The example uses a data set discussed by Goldstein (995, Chapt 4) consists of a set of responses to a sies of 4 test booklets by 439 pupils in 99 schools. Each student responded to a core booklet containing Earth science, biology physics items to a furth two booklets romly chosen from three available. Two of these booklets we in biology one in physics. As a result the are 6 possible scores, one in earth

16 Multilevel Factor Analysis 6 science, three in biology in physics, each student having up to five. A full description of the data is given in Goldstein (995). A multivariate -level model fitted to the data gives the following maximum likelihood estimates for the means covariance/correlation matrices in Table 4. The model can be written as follows y = β x + γ x z + u x + v x ik i hk i hk k ik hk ik hk h= h= h= h= u v u v u3 v3 ~ N(0, Ωu) ~ N(0, Ωv) u4 v4 u 5 v 5 u v 6 6 x = if h = i, 0 othwise z hk k = if a girl, = 0 if a boy i indexes response variables, indexes students, k indexes schools (5)

17 Multilevel Factor Analysis 7 Table 4. Science attainment estimates. Fixed Estimate (s.e.) Earth Science Core (0.0076) Biology Core 0.7 (0.000) Biology R (0.009) Biology R (0.067) Physics Core 0.75 (0.08) Physics R (0.08) Earth Science Core (girls - boys) (0.0059) Biology Core (girls - boys) (0.0066) Biology R3 (girls - boys) (0.05) Biology R4 (girls - boys) (0.037) Physics Core (girls - boys) (0.0073) Physics R (girls - boys) (0.06) Rom. Variances on diagonal; correlations off-diagonal Level (School) E.Sc. core Biol. Core Biol R3 Biol R4 Phys. core Phys. R E.Sc. core Biol. core Biol R Biol R Phys. core Phys. R Level (Student) E.Sc. core Biol. Core Biol R3 Biol R4 Phys. core Phys. R E.Sc. core Biol. core Biol R Biol R Phys. core Phys. R We now fit two level factor models to these data, shown in Table 5. We omit the fixed effects in Table 5 since they are vy close to those in Table 4. Model A has two factors at level a single factor at level. For illustration we have constrained all the variances to be.0 allowed the covariance (correlation) between the level factors to be estimated. Inspection of the correlation structure suggests a model whe the first factor at level estimates the loadings for Earth Science Biology, constraining those for Physics to be zo (the physics responses have the highest correlation), for the second factor at level to allow only the loadings for Physics to be unconstrained. The high correlation of 0.90 between the factors suggests that phaps a single factor will be an adequate summary. Although we do not present results, we have also studied

18 Multilevel Factor Analysis 8 a similar structure for two factors at the school level whe the correlation is estimated to be 0.97, strongly suggesting a single factor at that level. For model B we have separated the three topics of Earth Science, Biology Physics to separately have non-zo loadings on three corresponding factors at the student level. This time the high int-correlation is that between the Biology Physics booklets with only modate (0.49, 0.55) correlations between Earth Science Biology Physics. This suggests that we need at least two factors to describe the student level data that our preliminary analysis suggesting ust one factor can be improved. Since our analyses are for illustrative purposes only we have not pursued furth possibilities with these data.

19 Multilevel Factor Analysis 9 Table 5. Science attainment MCMC factor model estimates. Paramet A Estimate (s.e.) B Estimate (s.e.) Level ; factor loadings E.Sc. core 0.06 (0.004) 0. (0.0) Biol. core 0. (0.004) 0* Biol R (0.008) 0* Biol R4 0. (0.009) 0* Phys. core 0* 0* Phys. R 0* 0* Level ; factor loadings E.Sc. core 0* 0* Biol. core 0* 0.0 (0.005) Biol R3 0* 0.05 (0.008) Biol R4 0* 0.0 (0.009) Phys. core 0. (0.005) 0* Phys. R 0. (0.007) 0* Level ; factor 3 loadings E.Sc. core - 0* Biol. core - 0* Biol R3-0* Biol R4-0* Phys. core - 0. (0.005) Phys. R - 0. (0.007) Level ; factor loadings E.Sc. core 0.04 (0.007) 0.04 (0.007) Biol. core 0.09 (0.008) 0.09 (0.008) Biol R (0.009) 0.05 (0.00) Biol R4 0.0 (0.06) 0.0 (0.06) Phys. core 0.0 (0.00) 0.0 (0.00) Phys. R 0.09 (0.0) 0.09 (0.0) Level residual variances E.Sc. core 0.07 (0.00) (0.004) Biol. core 0.05 (0.00) 0.05 (0.00) Biol R (0.00) (0.00) Biol R (0.00) (0.00) Phys. core 0.06 (0.00) 0.06 (0.00) Phys. R 0.09 (0.00) (0.00) Level residual variances E.Sc. core 0.00 (0.0005) 0.00 (0.0005) Biol. core (0.0003) (0.0003) Biol R (0.0008) 0.00 (0.0008) Biol R (0.00) 0.00 (0.00) Phys. core 0.00 (0.0005) 0.00 (0.0005) Phys. R (0.0009) (0.0009) Level correlation factors & 0.90 (0.03) 0.55 (0.0) Level correlation factors & (0.09) Level correlation factors &3-0.9 (0.04) * indicates constrained paramet. A chain of length 0,000 with a burn in of 000 was used. Level is student, level is school. Discussion This pap has shown how factor models can be specified fitted. The MCMC computations allow point intval estimation with an advantage ov maximum

20 Multilevel Factor Analysis 0 likelihood estimation in that full account is taken of the unctainty associated with the estimates. In addition it allows full Bayesian modelling with informative prior distributions which may be especially useful for identification problems. As pointed out in the introduction, the MCMC algorithm is readily extended to hle the genal structural equation case, furth work is being carried out along the following lines. For simplicity we consid the single level model case to illustrate the procedure. A fairly genal, single level, structural equation model can be written in the following matrix form (see McDonald, 985 for some altnative representations) Av = Av + W Y =Λ v + U Y =Λ v + U (6) Whe Y, Y are obsved multivariate vectors of responses, is a known transformation matrix, often set to the identity matrix, is a coefficient matrix which specifies a multivariate linear model between the set of transformed factors, v, v, Λ, Λ WU,, U are loadings, U, U are uniquenesses, W is a rom residual vector are mutually independent with zo means. The extension of this model to the multilevel case follows that of the factor model we shall restrict ourselves to sketching how the MCMC algorithm can be applied to (6). Note, that as before we can add covariates measured variables multiplying the latent variable tms as shown in * (6). Note that we can write as the vector by stacking the rows of. For example if A A A A A a0 a0 a a * A =, then A = a a3 a a 3 The distributional form of the model can be written as Av ~ MVN( Av, Σ ) 3 v ~ MVN(0, Σ ), v ~ MVN(0, Σ ) v v Y ~ MVN( Λv, Σ), Y ~ MVN( Λv, Σ) with priors

21 Multilevel Factor Analysis A ~ MVN( Aˆ, Σ ), Λ ~ MVN( Λˆ, Σ ), Λ ~ MVN( Λˆ, Σ * * * A Λ Λ ) Σ, Σ, Σ 3 having invse Wishart priors. The coefficient loading matrices have conditional Normal distributions as do the factor values. The covariance matrices uniqueness variance matrices involve steps similar to those given in the earli algorithm. The extension to two levels more follows the same genal procedure as we have shown earli. The model can be genalised furth by considing m sets of response variables, Y, Y,... Y m in (6) seval, linked, multiple group structural relationships with the k- th relationship having the genal form h k k k k V A = V A + W ( ) ( ) ( ) ( ) ( k) h h g g g the above procedure can be extended for this case. We note that the model for simultaneous factor analysis (or, more genally, structural equation model) in seval populations is a special case of this model, with the addition of any required constraints on paramet values across populations. We can also genalise to include fixed effects, responses at level covariates Z h for the factors, which may be a subset of the fixed effects covariates X Y = Xβ +Λ v Z + u +Λ v Z + e Y =Λ v Z + u () () () () Y = { y }, Y = { y } () r =,..., R i =,..., i =,..., J r (7) The supscript refs to the level at which the measurement exists, so that, for example, y, y i ref respectively to the first measurement in the i-th level unit in the -th level unit (say students schools) the second measurement taken at school level for the -th school. Furth work is currently being carried out on applying these procedures to non-linear models specifically to genalised linear models. For simplicity consid the binomial response logistic model as illustration. Write

22 Multilevel Factor Analysis E( y ) = π = [+ exp ( a + λv )] i i i i i y ~Bin( π, n ) i i (8) The simplest model is the multiple binary response model ( n i = ) that is refred to in the psychometric litature as a unidimensional item response model (Goldstein & Wood, 989, Bartholomew Knott, 999). Estimation for this model is not possible using a simple Gibbs sampling algorithm but as in the stard binomial multilevel case (see Browne, 998) we could replace any Gibbs steps that do not have stard conditional postior distributions with Metropolis Hastings steps. The issues that surround the specification intpretation of single level factor structural equation models are also present in our multilevel vsions. Paramet identification has already been discussed; anoth issue is the boundary Heywood case. We have obsved such solutions occurring whe sets of loading paramets tend towards zo or a correlation tends towards.0. A final important issue that only affects stochastic procedures is the problem of flipping states. This means that the is not a unique solution even in a -factor problem as the loadings factor values may all flip their sign to give an equivalent solution. When the numb of factors increases the are great problems as factors may swap ov as the chains progress. This means that identifiability is an even great considation when using stochastic techniques. For making infences about individual paramets or functions of paramets we can use the chain values to provide point intval estimates. These can also be used to provide large sample Wald tests for sets of paramets. Zhu Lee propose a chisquare discrepancy function for evaluating the postior predictive p-value, which is the Bayesian countpart of the frequentist p-value statistic (Meng, 994). In the multilevel case the α level probability becomes J i () i () i ( ) ( ) ( ) (, ˆ B = χα i θ ) i= i= (9) pˆ Y i i p D Y v DY (, vˆ ) Y Y () i () i T i θ = i Σi i whe Y is the vector of responses for the i-th level unit Σ is the (non-diagonal) i residual covariance matrix. i

23 Multilevel Factor Analysis 3 Refences Arbuckle, J.L. (997). AMOS: Vsion 3.6. Chicago: Small Wats Corporation. Bartholomew, D.J. Knott, M. (999). Latent variable models factor analysis. ( nd edition). London, Arnold. Browne, W. (998). Applying MCMC methods to multilevel models. PhD thesis, Univsity of Bath. Browne, W. Rasbash, J. (00). MCMC algorithms for variance matrices with applications in multilevel modeling. (in preparation) Blozis, S. A. Cudeck, R. (999). Conditionally linear mixed-effects models with latent variable covariates. Journal of Educational Behavioural Statistics 4: Clayton, D. Rasbash, J. (999). Estimation in large crossed rom effect models by data augmentation. Journal of the Royal Statistical Society, A. 6: Evitt, B. S. (984). An introduction to latent variable models. London, Chapman Hall. Geweke, J. Zhou, G. (996). Measuring the pricing ror of the arbitrage pricing theory. The review of intnational financial studies 9: Goldstein, H. (995). Multilevel statistical models. London: Edward Arnold. Goldstein, H., & McDonald, R.P. (988). A genal model for the analysis of multilevel data. Psychometrika, 53 (4), Goldstein, H., & Rasbash, J. (996). Improved approximations for multilevel models with binary responses. Journal of the Royal Statistical Society, A. 59, Goldstein, H., Rasbash, J., Plewis, I., Drap, D., Browne, W., Yang, M., Woodhouse, G., & Healy, M. (998). A us s guide to MLwiN. Multilevel Models Proect, Institute of Education Univsity of London. Goldstein, H., & Wood, R. (989). Five decades of item response modelling. British Journal of Mathematical Statistical Psychology, 4,

24 Multilevel Factor Analysis 4 Lindsey, J. K. (999). Relationships among sample size, model selection likelihood regions, scientifically important diffences. Journal of the Royal Statistical Society, D 48: Longford, N., & Muthen, B. O. (99). Factor analysis for clusted obsvations. Psychometrika, 57, Mcdonald, R. P. (985). Factor analysis related methods. Hillsdale, NJ, Lawrence Earlbaum. McDonald, R. P. (993). A genal model for two-level data with responses missing at rom. Psychometrika, 58, McDonald, R.P., & Goldstein, H. (989). Balanced vsus unbalanced designs for linear structural relations in two-level data. British Journal of Mathematical Statistical Psychology, 4, 5-3. Meng, X. L. (994). Postior predictive p-values. Annals of Statistics : Rabe-hesketh, S., Pickles, A. Taylor, C. (000). Sg9: Genalized linear latent mixed models. Stata technical bulletin 53, Rasbash, J., Browne, W., Goldstein, H., Yang, M., et al. (000). A us's guide to MlwiN (Second Edition). London, Institute of Education: Raudenbush, S.W. (995). Maximum Likelihood estimation for unbalanced multilevel covariance structure models via the EM algorithm. British Journal of Mathematical Statistical Psychology, 48, Rowe, K.J., & Hill, P.W. (997). Simultaneous estimation of multilevel structural equations to model students' educational progress. Pap presented at the Tenth Intnational Congress for School effectiveness School Improvement, Memphis, Tennessee. Rowe, K.J., & Hill, P.W. (998). Modelling educational effectiveness in classrooms: The use of multilevel structural equations to model students progress. Educational Research Evaluation, 4 (to appear). Rubin, D.B., & Thay, D. T. (98). EM algorithms for ML factor analysis. Psychometrika, 47, Scheines, R., Hoitink, H. Boomsma, A. (999). Bayesian estimation testing of structural equation models. Psychometrika 64: 37-5.

25 Multilevel Factor Analysis 5 Silvman, B.W. (986). Density Estimation for Statistics Data analysis. London: Chapman Hall. Zhu, H.-T. Lee, S.-Y. (999). Statistical analysis of nonlinear factor analysis models. British Journal of Mathematical Statistical Psychology 5: 5-4.

Partitioning variation in multilevel models.

Partitioning variation in multilevel models. Partitioning variation in multilevel models. by Harvey Goldstein, William Browne and Jon Rasbash Institute of Education, London, UK. Summary. In multilevel modelling, the residual variation in a response

More information

Summary of Talk Background to Multilevel modelling project. What is complex level 1 variation? Tutorial dataset. Method 1 : Inverse Wishart proposals.

Summary of Talk Background to Multilevel modelling project. What is complex level 1 variation? Tutorial dataset. Method 1 : Inverse Wishart proposals. Modelling the Variance : MCMC methods for tting multilevel models with complex level 1 variation and extensions to constrained variance matrices By Dr William Browne Centre for Multilevel Modelling Institute

More information

Bayesian inference for factor scores

Bayesian inference for factor scores Bayesian inference for factor scores Murray Aitkin and Irit Aitkin School of Mathematics and Statistics University of Newcastle UK October, 3 Abstract Bayesian inference for the parameters of the factor

More information

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data Quality & Quantity 34: 323 330, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 323 Note Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions

More information

Nonlinear multilevel models, with an application to discrete response data

Nonlinear multilevel models, with an application to discrete response data Biometrika (1991), 78, 1, pp. 45-51 Printed in Great Britain Nonlinear multilevel models, with an application to discrete response data BY HARVEY GOLDSTEIN Department of Mathematics, Statistics and Computing,

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /rssa.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /rssa. Goldstein, H., Carpenter, J. R., & Browne, W. J. (2014). Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. Journal

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

A Multilevel Analysis of Graduates Job Satisfaction

A Multilevel Analysis of Graduates Job Satisfaction A Multilevel Analysis of Graduates Job Satisfaction Leonardo Grilli, Carla Rampichini Statistics Department G. Parenti", University of Florence, Italy Summary. In this paper, we analyse some aspects of

More information

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions R U T C O R R E S E A R C H R E P O R T Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions Douglas H. Jones a Mikhail Nediak b RRR 7-2, February, 2! " ##$%#&

More information

LINEAR MULTILEVEL MODELS. Data are often hierarchical. By this we mean that data contain information

LINEAR MULTILEVEL MODELS. Data are often hierarchical. By this we mean that data contain information LINEAR MULTILEVEL MODELS JAN DE LEEUW ABSTRACT. This is an entry for The Encyclopedia of Statistics in Behavioral Science, to be published by Wiley in 2005. 1. HIERARCHICAL DATA Data are often hierarchical.

More information

Empirical Validation of the Critical Thinking Assessment Test: A Bayesian CFA Approach

Empirical Validation of the Critical Thinking Assessment Test: A Bayesian CFA Approach Empirical Validation of the Critical Thinking Assessment Test: A Bayesian CFA Approach CHI HANG AU & ALLISON AMES, PH.D. 1 Acknowledgement Allison Ames, PhD Jeanne Horst, PhD 2 Overview Features of the

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Bayesian Methods in Multilevel Regression

Bayesian Methods in Multilevel Regression Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design

More information

MULTILEVEL IMPUTATION 1

MULTILEVEL IMPUTATION 1 MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression

More information

Multilevel Analysis of Grouped and Longitudinal Data

Multilevel Analysis of Grouped and Longitudinal Data Multilevel Analysis of Grouped and Longitudinal Data Joop J. Hox Utrecht University Second draft, to appear in: T.D. Little, K.U. Schnabel, & J. Baumert (Eds.). Modeling longitudinal and multiple-group

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Hierarchical Linear Models. Jeff Gill. University of Florida

Hierarchical Linear Models. Jeff Gill. University of Florida Hierarchical Linear Models Jeff Gill University of Florida I. ESSENTIAL DESCRIPTION OF HIERARCHICAL LINEAR MODELS II. SPECIAL CASES OF THE HLM III. THE GENERAL STRUCTURE OF THE HLM IV. ESTIMATION OF THE

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research Research Methods Festival Oxford 9 th July 014 George Leckie

More information

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised ) Ronald H. Heck 1 University of Hawai i at Mānoa Handout #20 Specifying Latent Curve and Other Growth Models Using Mplus (Revised 12-1-2014) The SEM approach offers a contrasting framework for use in analyzing

More information

FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION

FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION FACTOR ANALYSIS AS MATRIX DECOMPOSITION JAN DE LEEUW ABSTRACT. Meet the abstract. This is the abstract. 1. INTRODUCTION Suppose we have n measurements on each of taking m variables. Collect these measurements

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring

Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring Journal of Educational and Behavioral Statistics Fall 2005, Vol. 30, No. 3, pp. 295 311 Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring

More information

The Bayesian Group-Lasso for Analyzing Contingency Tables

The Bayesian Group-Lasso for Analyzing Contingency Tables Sudhir Raman SUDHIR.RAMAN@UNIBAS.CH Department of Comput Science, Univsity of Basel, Bnoullistr. 16, CH-4056 Basel, Switzland Thomas J. Fuchs THOMAS.FUCHS@INF.ETHZ.CH Institute of Computational Science,

More information

Variance partitioning in multilevel logistic models that exhibit overdispersion

Variance partitioning in multilevel logistic models that exhibit overdispersion J. R. Statist. Soc. A (2005) 168, Part 3, pp. 599 613 Variance partitioning in multilevel logistic models that exhibit overdispersion W. J. Browne, University of Nottingham, UK S. V. Subramanian, Harvard

More information

BIBLIOGRAPHY. Azzalini, A., Bowman, A., and Hardle, W. (1986), On the Use of Nonparametric Regression for Model Checking, Biometrika, 76, 1, 2-12.

BIBLIOGRAPHY. Azzalini, A., Bowman, A., and Hardle, W. (1986), On the Use of Nonparametric Regression for Model Checking, Biometrika, 76, 1, 2-12. BIBLIOGRAPHY Anderson, D. and Aitken, M. (1985), Variance Components Models with Binary Response: Interviewer Variability, Journal of the Royal Statistical Society B, 47, 203-210. Aitken, M., Anderson,

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham MCMC 2: Lecture 3 SIR models - more topics Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. What can be estimated? 2. Reparameterisation 3. Marginalisation

More information

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

MULTILEVEL MODELS WHERE THE RANDOM EFFECTS ARE CORRELATED WITH THE FIXED PREDICTORS

MULTILEVEL MODELS WHERE THE RANDOM EFFECTS ARE CORRELATED WITH THE FIXED PREDICTORS MULTILEVEL MODELS WHERE THE RANDOM EFFECTS ARE CORRELATED WITH THE FIXED PREDICTORS Nigel Rice Centre for Health Economics University of York Heslington York Y01 5DD England and Institute of Education

More information

Multilevel Analysis, with Extensions

Multilevel Analysis, with Extensions May 26, 2010 We start by reviewing the research on multilevel analysis that has been done in psychometrics and educational statistics, roughly since 1985. The canonical reference (at least I hope so) is

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Monte Carlo Simulations for Rasch Model Tests

Monte Carlo Simulations for Rasch Model Tests Monte Carlo Simulations for Rasch Model Tests Patrick Mair Vienna University of Economics Thomas Ledl University of Vienna Abstract: Sources of deviation from model fit in Rasch models can be lack of unidimensionality,

More information

Using Bayesian Priors for More Flexible Latent Class Analysis

Using Bayesian Priors for More Flexible Latent Class Analysis Using Bayesian Priors for More Flexible Latent Class Analysis Tihomir Asparouhov Bengt Muthén Abstract Latent class analysis is based on the assumption that within each class the observed class indicator

More information

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, N.C.,

More information

Bayesian Nonparametric Rasch Modeling: Methods and Software

Bayesian Nonparametric Rasch Modeling: Methods and Software Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revised on April 24, 2017 Today we are going to learn... 1 Markov Chains

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Efficient Analysis of Mixed Hierarchical and Cross-Classified Random Structures Using a Multilevel Model

Efficient Analysis of Mixed Hierarchical and Cross-Classified Random Structures Using a Multilevel Model Efficient Analysis of Mixed Hierarchical and Cross-Classified Random Structures Using a Multilevel Model Jon Rasbash; Harvey Goldstein Journal of Educational and Behavioral Statistics, Vol. 19, No. 4.

More information

Working Papers in Econometrics and Applied Statistics

Working Papers in Econometrics and Applied Statistics T h e U n i v e r s i t y o f NEW ENGLAND Working Papers in Econometrics and Applied Statistics Finite Sample Inference in the SUR Model Duangkamon Chotikapanich and William E. Griffiths No. 03 - April

More information

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models Confirmatory Factor Analysis: Model comparison, respecification, and more Psychology 588: Covariance structure and factor models Model comparison 2 Essentially all goodness of fit indices are descriptive,

More information

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Bayesian Estimation of Input Output Tables for Russia

Bayesian Estimation of Input Output Tables for Russia Bayesian Estimation of Input Output Tables for Russia Oleg Lugovoy (EDF, RANE) Andrey Polbin (RANE) Vladimir Potashnikov (RANE) WIOD Conference April 24, 2012 Groningen Outline Motivation Objectives Bayesian

More information

INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION

INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION Pak. J. Statist. 2017 Vol. 33(1), 37-61 INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION A. M. Abd AL-Fattah, A.A. EL-Helbawy G.R. AL-Dayian Statistics Department, Faculty of Commerce, AL-Azhar

More information

Appendix: Modeling Approach

Appendix: Modeling Approach AFFECTIVE PRIMACY IN INTRAORGANIZATIONAL TASK NETWORKS Appendix: Modeling Approach There is now a significant and developing literature on Bayesian methods in social network analysis. See, for instance,

More information

Nesting and Equivalence Testing

Nesting and Equivalence Testing Nesting and Equivalence Testing Tihomir Asparouhov and Bengt Muthén August 13, 2018 Abstract In this note, we discuss the nesting and equivalence testing (NET) methodology developed in Bentler and Satorra

More information

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011 INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA Belfast 9 th June to 10 th June, 2011 Dr James J Brown Southampton Statistical Sciences Research Institute (UoS) ADMIN Research Centre (IoE

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of

More information

ANALYSIS OF TWO-LEVEL STRUCTURAL EQUATION MODELS VIA EM TYPE ALGORITHMS

ANALYSIS OF TWO-LEVEL STRUCTURAL EQUATION MODELS VIA EM TYPE ALGORITHMS Statistica Sinica 8(1998), 749-766 ANALYSIS OF TWO-LEVEL STRUCTURAL EQUATION MODELS VIA EM TYPE ALGORITHMS Sik-Yum Lee and Wai-Yin Poon Chinese University of Hong Kong Abstract: In this paper, the maximum

More information

Model fit evaluation in multilevel structural equation models

Model fit evaluation in multilevel structural equation models Model fit evaluation in multilevel structural equation models Ehri Ryu Journal Name: Frontiers in Psychology ISSN: 1664-1078 Article type: Review Article Received on: 0 Sep 013 Accepted on: 1 Jan 014 Provisional

More information

Three-Level Multiple Imputation: A Fully Conditional Specification Approach. Brian Tinnell Keller

Three-Level Multiple Imputation: A Fully Conditional Specification Approach. Brian Tinnell Keller Three-Level Multiple Imputation: A Fully Conditional Specification Approach by Brian Tinnell Keller A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Arts Approved

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL

POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL COMMUN. STATIST. THEORY METH., 30(5), 855 874 (2001) POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL Hisashi Tanizaki and Xingyuan Zhang Faculty of Economics, Kobe University, Kobe 657-8501,

More information

Determining the number of components in mixture models for hierarchical data

Determining the number of components in mixture models for hierarchical data Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

Bayesian Inference: Probit and Linear Probability Models

Bayesian Inference: Probit and Linear Probability Models Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 5-1-2014 Bayesian Inference: Probit and Linear Probability Models Nate Rex Reasch Utah State University Follow

More information

Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Endogenous Variables Models Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take

More information

Investigation into the use of confidence indicators with calibration

Investigation into the use of confidence indicators with calibration WORKSHOP ON FRONTIERS IN BENCHMARKING TECHNIQUES AND THEIR APPLICATION TO OFFICIAL STATISTICS 7 8 APRIL 2005 Investigation into the use of confidence indicators with calibration Gerard Keogh and Dave Jennings

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software SAMSI Astrostatistics Tutorial More Markov chain Monte Carlo & Demo of Mathematica software Phil Gregory University of British Columbia 26 Bayesian Logical Data Analysis for the Physical Sciences Contents:

More information

INTRODUCTION TO STRUCTURAL EQUATION MODELS

INTRODUCTION TO STRUCTURAL EQUATION MODELS I. Description of the course. INTRODUCTION TO STRUCTURAL EQUATION MODELS A. Objectives and scope of the course. B. Logistics of enrollment, auditing, requirements, distribution of notes, access to programs.

More information

University of Groningen. The multilevel p2 model Zijlstra, B.J.H.; van Duijn, Maria; Snijders, Thomas. Published in: Methodology

University of Groningen. The multilevel p2 model Zijlstra, B.J.H.; van Duijn, Maria; Snijders, Thomas. Published in: Methodology University of Groningen The multilevel p2 model Zijlstra, B.J.H.; van Duijn, Maria; Snijders, Thomas Published in: Methodology IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's

More information

Parsimonious Bayesian Factor Analysis when the Number of Factors is Unknown

Parsimonious Bayesian Factor Analysis when the Number of Factors is Unknown Parsimonious Bayesian Factor Analysis when the Number of Factors is Unknown Sylvia Frühwirth-Schnatter and Hedibert Freitas Lopes July 7, 2010 Abstract We introduce a new and general set of identifiability

More information

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Second meeting of the FIRB 2012 project Mixture and latent variable models for causal-inference and analysis

More information

Dynamic Factor Models and Factor Augmented Vector Autoregressions. Lawrence J. Christiano

Dynamic Factor Models and Factor Augmented Vector Autoregressions. Lawrence J. Christiano Dynamic Factor Models and Factor Augmented Vector Autoregressions Lawrence J Christiano Dynamic Factor Models and Factor Augmented Vector Autoregressions Problem: the time series dimension of data is relatively

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Statistical Inference for Stochastic Epidemic Models

Statistical Inference for Stochastic Epidemic Models Statistical Inference for Stochastic Epidemic Models George Streftaris 1 and Gavin J. Gibson 1 1 Department of Actuarial Mathematics & Statistics, Heriot-Watt University, Riccarton, Edinburgh EH14 4AS,

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University A Markov chain Monte Carlo approach to confirmatory item factor analysis Michael C. Edwards The Ohio State University An MCMC approach to CIFA Overview Motivating examples Intro to Item Response Theory

More information

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH Lecture 5: Spatial probit models James P. LeSage University of Toledo Department of Economics Toledo, OH 43606 jlesage@spatial-econometrics.com March 2004 1 A Bayesian spatial probit model with individual

More information

Application of Plausible Values of Latent Variables to Analyzing BSI-18 Factors. Jichuan Wang, Ph.D

Application of Plausible Values of Latent Variables to Analyzing BSI-18 Factors. Jichuan Wang, Ph.D Application of Plausible Values of Latent Variables to Analyzing BSI-18 Factors Jichuan Wang, Ph.D Children s National Health System The George Washington University School of Medicine Washington, DC 1

More information

Lecture 16: Mixtures of Generalized Linear Models

Lecture 16: Mixtures of Generalized Linear Models Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently

More information

Conditional Marginalization for Exponential Random Graph Models

Conditional Marginalization for Exponential Random Graph Models Conditional Marginalization for Exponential Random Graph Models Tom A.B. Snijders January 21, 2010 To be published, Journal of Mathematical Sociology University of Oxford and University of Groningen; this

More information

SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, )

SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, ) Econometrica Supplementary Material SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, 653 710) BY SANGHAMITRA DAS, MARK ROBERTS, AND

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

Quantile POD for Hit-Miss Data

Quantile POD for Hit-Miss Data Quantile POD for Hit-Miss Data Yew-Meng Koh a and William Q. Meeker a a Center for Nondestructive Evaluation, Department of Statistics, Iowa State niversity, Ames, Iowa 50010 Abstract. Probability of detection

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

Estimation of Operational Risk Capital Charge under Parameter Uncertainty

Estimation of Operational Risk Capital Charge under Parameter Uncertainty Estimation of Operational Risk Capital Charge under Parameter Uncertainty Pavel V. Shevchenko Principal Research Scientist, CSIRO Mathematical and Information Sciences, Sydney, Locked Bag 17, North Ryde,

More information

Supplementary materials for: Exploratory Structural Equation Modeling

Supplementary materials for: Exploratory Structural Equation Modeling Supplementary materials for: Exploratory Structural Equation Modeling To appear in Hancock, G. R., & Mueller, R. O. (Eds.). (2013). Structural equation modeling: A second course (2nd ed.). Charlotte, NC:

More information

Specification and estimation of exponential random graph models for social (and other) networks

Specification and estimation of exponential random graph models for social (and other) networks Specification and estimation of exponential random graph models for social (and other) networks Tom A.B. Snijders University of Oxford March 23, 2009 c Tom A.B. Snijders (University of Oxford) Models for

More information

MAXIMUM LIKELIHOOD IN GENERALIZED FIXED SCORE FACTOR ANALYSIS 1. INTRODUCTION

MAXIMUM LIKELIHOOD IN GENERALIZED FIXED SCORE FACTOR ANALYSIS 1. INTRODUCTION MAXIMUM LIKELIHOOD IN GENERALIZED FIXED SCORE FACTOR ANALYSIS JAN DE LEEUW ABSTRACT. We study the weighted least squares fixed rank approximation problem in which the weight matrices depend on unknown

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Advanced Statistical Modelling

Advanced Statistical Modelling Markov chain Monte Carlo (MCMC) Methods and Their Applications in Bayesian Statistics School of Technology and Business Studies/Statistics Dalarna University Borlänge, Sweden. Feb. 05, 2014. Outlines 1

More information

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Statistical Methods. Missing Data  snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23 1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

An Approximate Test for Homogeneity of Correlated Correlation Coefficients

An Approximate Test for Homogeneity of Correlated Correlation Coefficients Quality & Quantity 37: 99 110, 2003. 2003 Kluwer Academic Publishers. Printed in the Netherlands. 99 Research Note An Approximate Test for Homogeneity of Correlated Correlation Coefficients TRIVELLORE

More information

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.

More information