Application of the Stochastic EM Method to Latent Regression Models

Size: px
Start display at page:

Download "Application of the Stochastic EM Method to Latent Regression Models"

Transcription

1 Research Report Application of the Stochastic EM Method to Latent Regression Models Matthias von Davier Sandip Sinharay Research & Development August 2004 RR-04-34

2

3 Application of the Stochastic EM Method to Latent Regression Models Matthias von Davier and Sandip Sinharay ETS, Princeton, NJ August 2004

4 ETS Research Reports provide preliminary and limited dissemination of ETS research prior to publication. To obtain a PDF or a print copy of a report, please visit:

5 Abstract The reporting methods used in large scale assessments such as the National Assessment of Educational Progress (NAEP) rely on a latent regression model. The first component of the model consists of a p-scale IRT measurement model that defines the response probabilities on a set of cognitive items in p scales depending on a p-dimensional latent trait variable θ = (θ 1,... θ p ). In the second component, the conditional distribution of this latent trait variable θ is modeled by a multivariate, multiple regression on a set of predictor variables, which are usually based on student, school and teacher variables in assessments such as NAEP. In order to fit the latent regression model using the maximum (marginal) likelihood estimation technique, multivariate integrals have to be evaluated. In the computer program MGROUP used by ETS for fitting the latent regression model to data from NAEP and other sources, the integration is currently done either by numerical quadrature (for problems up to two dimensions) or by an approximation of the integral., the current operational version of the MGROUP program used in NAEP and other assessments since 1993, is based on Laplace approximation that may not provide fully satisfactory results, especially if the number of items per scale is small. This paper examines the application of stochastic expectation-maximization (EM) methods (where an integral is approximated by an average over a random sample) to NAEP-like settings. We present a comparison of with a promising implementation of the stochastic EM algorithm that utilizes importance sampling. Simulation studies and real data analysis show that the stochastic EM method provides a viable alternative to for fitting multivariate latent regression models. Key words:, NAEP, importance sampling, stochastic integration i

6 Acknowledgements The authors thank John Mazzeo, Shelby Haberman, Alina von Davier, Neal Thomas, Andreas Oranje, Ying Jin, and Matthew Johnson for useful advice, Steve Isham for help with the data sets used in the analysis, and Kim Fryer for help with proofreading. ii

7 1 Introduction National Assessment of Educational Progress (NAEP), the only regularly administered and congressionally mandated national assessment program (see, e.g., Beaton & Zwick, 1992), is an ongoing survey of the academic achievement of the school students in the United States in a number of subject areas such as reading, writing, mathematics, and the like. It is administered by the National Center for Educational Statistics (NCES), a part of the U.S. Department of Education, to a selected sample of students in Grades 4, 8, and 12. A document called Nation s Report Card reports the results of NAEP for a number of academic subjects on a regular basis. A comparison is provided to the previous assessments in each subject area. The academic achievement is described as average student proficiency for all students in the U.S. and as the percentage of students attaining fixed levels of proficiency (the levels being defined by the National Assessment Government Board as what students should know at each grade level) in different subjects. In addition to producing these numbers for the nation as a whole, NAEP reports the same results for different subpopulations (based on gender, race, school-type, etc.) of the student population. NAEP is prohibited by law from reporting results for individual students, schools, or school districts and is designed to obtain optimal estimates of subpopulation characteristics rather than those of individual performance. To assess national performance in a valid way, NAEP must sample a wide and diverse body of student knowledge. To avoid the burden involved with presenting each item to every examinee, NAEP selects students randomly from designated grade and age populations (first, a sample of schools are selected according to a detailed stratified sampling plan, as mentioned in, e.g., Beaton & Zwick, 1992, and then students are sampled within the schools). NAEP then administers one of many possible booklets of items to each student. This process is sometimes referred to as matrix sampling of items. For example, the 2000 NAEP in mathematics assessment at grade 4 contained 173 items split across 26 booklets. Each item was developed to measure one of five subscales (a) Number and Operations (b) Measurements (c) Geometry (d) Data Analysis, and (e) Algebra. An item can be multiple-choice or constructed-response. Background (demographic) information are collected on the students through questionnaires that are 1

8 filled out by students, teachers, and school administrators. For example, the questionnaires collected information on 381 background variables for each student in the assessment. The above description clearly shows that NAEP s design and implementation are fundamentally different from those of a large scale testing program. NAEP reports were originally envisaged as simple lists of percents correct to individual survey items (Mislevy, Johnson, & Muraki, 1992), in the population as a whole and in subpopulations of particular interest. However, it was realized later that this approach has severe limitations (e.g., comparison is limited to groups of items common to student subpopulations), and major features of the detailed results from hundreds of items and hundreds of background variables could not be effectively communicated without some kind of statistical modeling. As a result, starting in 1984, NAEP reporting methods used a statistical model consisting of two components: (a) an item response theory (IRT) model, and (b) a linear regression model (see, e.g., Beaton, 1987; Mislevy et al., 1992). Other large scale educational assessments such as the International Adult Literacy Study (IALS; Kirsch, 2001), Trends in Mathematics and Science Study (TIMSS; Martin & Kelly, 1996), and Progress in International Reading Literacy Study (PIRLS; Mullis, Martin, Gonzalez, & Kennedy, 2003) also adopted essentially the same model. This model is referred to as either the conditioning model, multilevel IRT model, or latent regression model. An algorithm for estimating the parameters of this model is implemented in the MGROUP set of programs, an ETS product. The first component of the model (the IRT part) defines the responses of examinees as a set of cognitive items to be dependent on a p-dimensional latent trait/proficiency vector θ = (θ 1,... θ p ). In the second component (linear regression part), the distribution of the proficiency variable θ is modeled by a multivariate, multiple regression on a set of background/predictor variables, which contain information on respective students, schools, and teachers, etc. In order to compute maximum (marginal) likelihood estimates of the parameters of this latent regression model, the proficiency θ (usually multivariate) has to be integrated out; this makes estimation for the model problematic. Mislevy (1984, 1985) shows that the maximum likelihood estimates can be obtained using an expectation-maximization (EM) algorithm (Dempster, Laird, & Rubin, 1977). The algorithm requires the values of the 2

9 posterior means and the posterior variances for the examinee proficiency parameters. For problems up to two dimensions, the integration is computed using numerical quadrature by the BGROUP version (Beaton, 1987) of the MGROUP program in operational settings in NAEP. For higher dimensions, no numerical integration routine is available operationally (although it may now be possible to perform numerical integration for higher dimensions; work on that is in progress), and an approximation of the integral is used. The version of MGROUP, the current operational procedure used in NAEP and other assessments since 1993, is based on the Laplace approximation (Kass & Steffey, 1989) that ignores the higher order derivatives of the examinee posterior distribution and may not provide accurate results, especially in higher dimensions (e.g., a graphical plot for a data example in Thomas, 1993, shows that overestimates the high examinee posterior variances). A number of extensions to the version of MGROUP or proposals for alternative estimation methods have been suggested. The MCMC estimation approach of Johnson and Jenkins (in press) and Johnson (2002) focuses on implementing a fully Bayesian method for joint estimation of IRT and regression parameters. The direct estimation for normal populations (Cohen & Jiang, 1999) tries to replace the multiple regression by a model that is essentially equivalent to a multigroup IRT model with some additional restrictions on item parameters and group specific variances. The YGROUP version (von Davier & Yu, 2003) of MGROUP implements seemingly unrelated regressions (SUR; Zellner, 1962) and offers generalized least squares estimation of the latent regression model (von Davier & Yon, 2004). None of these alternatives have been entirely satisfactory. The MCMC estimation approach could not be implemented for the operational model with multidimensional proficiencies and the full set of background variables because of the time-consuming nature of MCMC estimation. The direct estimation approach as proposed by Cohen and Jiang (1999) leads to biased estimates even in very small data examples with nonnormal marginal ability distributions (von Davier, 2003). The SUR approach to estimating multivariate regressions is identical to generalized least squares (GLS) if all equations have the same regressors and observed dependent variables (Greene, 2002), a result that cannot be expected to hold in latent regressions. Therefore, there is scope of 3

10 further research in this area. Stochastic EM methods (SEM; Broniatowski, Celeux, & Diebolt, 1983; Celeux & Diebolt, 1985) have been suggested for use in EM algorithms where the E-step is hard to compute/approximate analytically or numerically. In an application of a stochastic EM algorithm, the E-step is handled by simulation. Depending on the nature of the simulation used in the E-step, a stochastic EM method can be a Monte Carlo EM (e.g., Wei & Tanner, 1990), Markov chain Monte Carlo EM (e.g., McCulloch, 1994), rejection sampling EM (e.g., Booth & Hobert, 1999), importance sampling EM (e.g., Booth & Hobert, 1999), etc. Important applications of the algorithm in psychometrics include Fox (2003), Lee, Song, and Lee (2003), Clarkson and Gonzalez (2001) and Meng and Schilling (1996), each of which employ the Markov chain Monte Carlo algorithm (e.g., Gilks, Richardson, & Spiegelhalter, 1996) to simulate in the E-step. The importance sampling EM (e.g., Booth & Hobert, 1999) is a stochastic EM method where each integral in the E-step is approximated by an average over a random sample. This paper examines the prospect of application of the importance sampling EM method to fit the latent regression model in NAEP-like settings. We also present a comparison of the results from the importance sampling EM algorithm with those from the version of MGROUP, which is the technique currently used in NAEP. For a low dimensional real data example, the results from the suggested method are also compared to the BGROUP version of MGROUP, which performs exact numerical integration and is the gold standard method in such settings. Simulation studies and real data analysis show that the stochastic EM method provides a viable alternative to for fitting latent regression models. 2 NAEP Model, Estimation, and the Current MGROUP Method This section first states the two component statistical model used in NAEP. A brief description of the EM algorithm, which will be helpful later, follows. The next subsection describes the current NAEP estimation process and the MGROUP program used in ETS. 4

11 2.0.1 The Latent Regression Model NAEP and other similar large scale assessments implement a latent regression model utilizing an IRT measurement model. Assume that the unique p-dimensional latent proficiency vector for examinee i is θ i = (θ i1, θ i2,... θ ip ). In NAEP, p could be 2, 3, or 5. Let us denote the response vector to the test items for examinee i as y i = (y i1, y i2,..., y ip ), where, y ik, a vector of responses, contributes information about θ ik. The likelihood for an examinee is given by p f(y i θ i ) = f 1 (y iq θ iq ) L(θ i ; y i ) (1) q=1 The terms f 1 (y iq θ iq ) above follow a univariate IRT model, usually with three-parament logistic (3PL) or generalized partial-credit model (GPCM) likelihood terms. For reasons to be discussed later, the dependence of (1) on the item parameters is suppressed. Suppose x i = (x i1, x i2,... x im ) are m fully measured demographic and educational characteristics for the examinee. Conditional on x i, the examinee proficiency vector θ i is assumed to follow a multivariate normal prior distribution, that is, θ i x i N(Γ x i, Σ). The mean parameter matrix Γ and the non negative definite variance matrix Σ are assumed to be the same for all examinee groups. Under this setup, L(Γ, Σ Y, X), the (marginal) likelihood function for (Γ, Σ) based on the data (X, Y ), is given by n L(Γ, Σ Y, X) = f 1 (y i1 θ i1 )... f 1 (y ip θ ip )φ(θ i Γ x i, Σ)dθ i, (2) i=1 where n is the number of examinees, and φ(..,.) is the multivariate normal density function EM Algorithm The EM algorithm (e.g., Dempster et al., 1977) is commonly used to obtain maximum likelihood estimates or posterior modes in problems with missing data. In fact, the algorithm is applicable in a broad range of situations where probability models can be reexpressed as ones on augmented parameter spaces and the added parameters can be thought of as missing data. The EM algorithm is relevant for mixed models because 5

12 the random effects may be viewed as missing data and the algorithm may be used to find maximum likelihood estimates for these models. Suppose that given an unknown parameter vector ω, data y follows a distribution f(y; ω) and that y = (y obs, y mis), where y obs is the observed data and y mis is the missing data. Suppose the objective is to obtain the maximum likelihood estimate of ω, i.e., the value of ω that maximizes the observed data likelihood f(y obs ω) = f(y obs, y mis ω)dy mis. The EM algorithm goes through a succession of E-steps and M-steps starting from an initial value ω (0) of the parameter vector ω. In the (t + 1)-th E-step, one maximizes Q(ω ω (t) ) = E[log{f(y obs, y mis ω)} y obs, ω (t) ], (3) the expected value of the complete data loglikelihood, where the expectation is with respect to the distribution of y mis given y obs and ω (t). In the corresponding M-step, one maximizes Q(ω ω (t) ) computed in the preceding E-step with respect to ω to get ω (t+1). The algorithm is said to have converged when two successive iterations give very similar values of the parameter vector. It is also important to monitor the value of the loglikelihood f(y obs ω). The EM algorithm is guaranteed to converge to a local maximum so it is customary to start the iterations at many points in the parameter space and then to compare the values of the likelihood at all of the local maxima. This method may be very slow to converge if the proportion of missing data is high. For simple problems, one may be able to do the averaging in (3) analytically. However, in many practical problems (e.g., in NAEP estimation), the expectation cannot be computed analytically; then one may need to use an approximation technique or a simulation technique to compute the expectation NAEP Estimation Process and the MGROUP Program NAEP uses a three-stage estimation process for fitting the above mentioned latent regression model and making inferences. The first stage, scaling, fits a simple IRT 6

13 model (consisting of 3PL and GPCM terms) of the form in (1) to the examinee response data and estimates the item parameters. The prior distribution used in this step is not θ i x i N(Γ x i, Σ) as described above, but is θ i N(0, I), that is, the subscales are assumed to be independent a priori. The second stage, conditioning, assumes that the item parameters are fixed at the estimates found in scaling and fits the model in (2) to the data, that is, estimates Γ and Σ as a first part. In the second part of the conditioning step, plausible values for all examinees are obtained using the parameter estimates obtained in scaling and the first part of conditioning the plausible values are used to estimate examinee subgroup averages. The third stage of the NAEP estimation process, called variance estimation, estimates the variances corresponding to the examinee subgroup averages using a jackknife approach (see, e.g., Johnson & Jenkins, in press). Our research will focus on the conditioning step and assume that the scaling has already been done (i.e., the item parameters are fixed); this is the reason we suppress the dependence of (1) on the item parameters. Because we will be concerned with the conditioning step, the remaining part of the section provides a more detailed discussion of it. The first objective of this step is to estimate Γ and Σ from the data. If the θ i s were known, the maximum likelihood estimators of Γ and Σ would be θ 1 ˆΓ = (X X) 1 X θ 2, (4)... θ n ˆΣ = 1 (θ i Γ x i )(θ i Γ x i ) (5) n i However, θ i s are actually unknown. Mislevy (1984, 1985) shows that the maximum likelihood estimates of Γ and Σ under unknown θ i s can be obtained using an EM algorithm (Dempster et al., 1977). The EM algorithm iterates through a number of expectation steps (E-step) and maximization steps (M-step). The expression for (Γ t+1, Σ t+1 ), the updated 7

14 value of the parameters in the t th M-step, is obtained as: θ 1t Γ t+1 = (X X) 1 X θ 2t, (6)... θ nt [ Σ t+1 = 1 Var(θ i X, Y, Γ t, Σ t ) + n i i ] ( θ it Γ t+1 x i)( θ it Γ t+1 x i), (7) where θ it = E(θ i X, Y, Γ t, Σ t ) is the posterior mean for examinee i given the preliminary parameter estimates of iteration t. The process is repeated until convergence of the estimates Γ and Σ. Formulae (6) and (7) require the values of the posterior means E(θ i X, Y, Γ t, Σ t ) and the posterior variances Var(θ i X, Y, Γ t, Σ t ) for the examinees. Correspondingly, the t th E-step computes the two required quantities for all the examinees. The MGROUP set of programs at ETS perform the EM algorithm mentioned above. The MGROUP program consists of two primary controlling routines called PHASE1 and PHASE2. The former does some preliminary processing while the latter directs the EM iterations. There are different versions of the MGROUP program depending on the method used to perform the E-step in PHASE2: BGROUP using numerical quadrature, NGROUP (Mislevy, 1985) using Bayesian normal theory, (Thomas, 1993) using Laplace approximations, YGROUP (von Davier & Yu, 2003) using SUR, and so on. The BGROUP version of MGROUP, applied when the dimension of θ i is less than or equal to two, applies numerical quadrature to approximate the integral. When the dimension of θ i is larger than two, NGROUP,, or YGROUP may be used. Note that N-group and Y-group are not used for operational purposes for the following reasons: N-group assumes normality of both the likelihood and the prior and may produce biased results if these assumptions are not met (Thomas, 1993). YGROUP performs the estimation independently by implementing seemingly unrelated regressions (Zellner, 1962), which is useful for generating starting values, but does not capitalize on the covariances between the p components of the latent trait. This may be inappropriate, for example, if the latent trait θ i is measured with very different precisions across components θ iq for some 8

15 or all subjects. The following subsection provides details about, the approach currently used operationally in several large scale assessments including NAEP. This approach is based on approximation of posterior moments of a multivariate distribution using Laplace approximation. E-step. The version of MGROUP uses the Laplace approximation of the posterior mean and variance, that is, uses where, E (θ j X, Y, Γ t, Σ t ) ˆθ j,mode 1 2 r=1 Cov (θ j, θ k X, Y, Γ t, Σ t ) G jk 1 p (G jr G kr G rr ) 2 ĥ(4) r + r=1 1 p r {[1 12 ] 2 I(r = s) ĥ (3) r r=1 s=1 ˆθ j,mode = j th component of the posterior mode of θ, p G jr G rr ĥ (3) r, j = 1, 2,... p, (8) ĥ (3) s G rs (G js G ks G rr + G js G kr G rs + G jr G ks G rs + } G jr G kr G ss ), j, k = 1, 2,... p, (9) h = log[f(y θ)φ(θ Γ tx, Σ t )] ( h ) 1 G jk =, (10) θ r θ s ˆθmode jk ĥ n r = n th pure partial derivative withrespectto θ r, evaluated at ˆθ mode, and I(r = s) is 1 if r = s and 0 otherwise. Details about the formulae (8) and (9) can be found in Thomas (1993, pp ). 1 The Laplace method does not provide an unbiased estimate of the quantity it is approximating and may provide inaccurate results if higher order derivatives (that the Laplace method assumes to be equal to zero) are not negligible. The error of approximation for each component of the mean and covariance of θ i is of order O( 1 k 2 ) (e.g., Kass & Steffey, 1989), where k is the number of items measuring skill corresponding to the component. Because the number of items given to each examinee in large scale assessments such as 9

16 NAEP is not too large (making k rather small), the errors in the Laplace approximation may become nonnegligible, especially for high-dimensional θ i s. Further, if the posterior distribution of θ i s is multimodal (which is not impossible, especially for a small number of items), the method can perform poorly. Therefore the version of MGROUP is not entirely satisfactory. Figure 1 in Thomas (1993), where the posterior variance estimates of 500 randomly selected examinees using the Laplace method and exact numerical integration for two-dimensional θ i are plotted, shows that the Laplace method provides inflated variance estimates for examinees with large posterior variance (see also Section 6 in this report). The departure maybe more severe for θ i s in higher dimensions. 3 The Suggested Stochastic EM Method Before going into the details of the suggested technique, we provide a brief description of the importance sampling method (e.g., Geweke, 1989.). 3.1 Importance Sampling The Radon-Nikodym theorem (e.g., Bauer, 1972) forms the basis for importance sampling. This theorem states that a measure M over Ω can be expressed in terms of another measure N over Ω M(A) = A fdn A m(ω)dω = A n(ω)f(ω)dω if N(A) = 0 M(A) = 0 for all A. The function f is referred to as the Radon-Nikodym derivative and is sometimes written as f = M. In the case that m, n are probability N densities over Ω = R p the theorem implies f(ω)ωn(ω)dω = which is equivalent to E M (ω) = E N (fω). Ω Ω ωm(ω)dω, This result is applied in importance sampling to approximate intractable integrals. Suppose we want to compute the expected value of a function g(ω) where the expectation is taken with respect to a probability density f(ω). We can express the expectation of g(ω) as E f {g(ω)} = ω g(ω)f(ω)dω = 10 ω g(ω)f(ω) h(ω)dω (11) h(ω)

17 for any probability density h(ω) defined on the same sample space that is zero only if f(ω) is zero. Now, if we can generate a random sample ω 1, ω 2,..., ω n from the distribution with density h(ω), we can approximate E f {g(ω)}, using (11), as E f {g(ω)} 1 n n i=1 g( ω i )f( ω i ) (12) h( ω i ) The standard error corresponding to the above estimate is given by dividing the standard deviation of the importance ratios g( ω i)f( ω i ) h( ω i ) by n. The standard deviation mentioned above determines the quality of the approximation. The density h(ω) is called the importance sampling density. If h(ω) is chosen so that the importance ratio g(ω)f(ω) h(ω) is roughly constant over the range of possible values of ω (which will make the standard deviation of the importance ratios small), fairly accurate approximation of the integral may be obtained. In particular, h(ω) should have heavier tails than the product g(ω)f(ω) since otherwise there may be a few ω i s for which h(ω) will be much smaller than g(ω)f(ω) and the estimate will blow up. The Radon-Nikodym theorem requires h = 0 g = 0 to hold, which can be stated in A A an approximate fashion as: Wherever h is zero, g needs to be zero. This is the theoretical justification for the heavier tail requirement of h over gf. The main advantages of importance sampling are that it provides an unbiased estimate of the integral and the accuracy can be monitored by examining the standard deviation of the g( ω i)f( ω i ) h( ω i ) values. In some situations, the researcher knows a distribution up to a constant only, but still needs to compute the moments of the distribution. For example, often that all that is available in a Bayesian analysis is a multiple of the posterior (by multiplying the likelihood and the prior), but not the posterior itself. Importance sampling can be used in this situation as well. Suppose, in the above setup, one does not know f(ω), but does know q(ω) c.f(ω). The expectation of a function g(ω) is given by ω E f {g(ω)} = g(ω)q(ω)dω ω q(ω)dω 11

18 The numerator in the right-hand side of above is approximated by g(ω)q(ω) g(ω)q(ω)dω = h(ω)dω 1 n g( ω i )q( ω i ) h(ω) n h( ω i ) ω while the denominator is approximated by n q( ω i ) i=1 h( ω i ) ω i=1 (see, e.g., Gelman, Carlin, Stern, & Rubin, 2003, pp ). Geweke (1989) shows that the estimate above converges almost surely to the estimand under weak assumptions and a central limit theorem (establishing that n 1/2 ( Ef {g(ω)} E f {g(ω)}) N (0, τ 2 ), where τ 2 can be estimated consistently) applies under stronger assumptions. The estimation of ratio of two quantities by the ratio of estimates of them is known as ratio estimation. For more details about ratio estimation, see, for example, Cochran (1977). 3.2 Importance Sampling in the E-step of the MGROUP Program This paper suggests approximating the posterior expectation and variance of the examinee proficiencies θ i in (6) and (7) using the importance sampling method. The posterior distribution of θ i, denoted as p(θ i X, Y, Γ t, Σ t ), is given by p(θ i X, Y, Γ t, Σ t ) f(y i1 θ i1 )... f(y ip θ ip )φ(θ Γ t x i, Σ t ) (13) using (2). The proportionality constant in (13) is a function of y i, Γ t, and Σ t. Let us denote q(θ i X, Y, Γ t, Σ t ) f(y i1 θ i1 )... f(y ip θ ip )φ(θ Γ t x i, Σ t ). (14) We drop the subscript i for convenience for the rest of the section and let θ denote the proficiency of an examinee. We have to compute the mean and variance of p(θ X, Y, Γ t, Σ t ), that is, the quantities E(θ X, Y, Γ t, Σ t ) θp(θ X, Y, Γ t, Σ t )dθ, and. (15) Var(θ X, Y, Γ t, Σ t ) (θ E(θ X, Y, Γ t, Σ t ))(θ E(θ X, Y, Γ t, Σ t )) p(θ X, Y, Γ t, Σ t )dθ (16) 12

19 Using ideas from the above description of the importance sampling method, if we can generate a random sample θ 1, θ 2,..., θ n from a distribution h(θ) approximating p(θ X, Y, Γ t, Σ t ) reasonably, we can approximate (15) by the ratio of and 1 n 1 n n θ j q(θ j X, Y, Γ t, Σ t ) j=1 h(θ j ) n q(θ j X, Y, Γ t, Σ t ) h(θ j. ) j=1 Similarly, we can approximate (16) as the ratio of 1 n n j=1 (θ j E(θ j X, Y, Γ t, Σ t ))(θ j E(θ j X, Y, Γ t, Σ t )) q(θ j X, Y, Γ t, Σ t ) h(θ j ) and. 1 n n q(θ j X, Y, Γ t, Σ t ) j=1 h(θ j ) Following Booth and Hobert (1999), who apply the importance sampling EM method for fitting generalized linear mixed models, we use a multivariate t importance sampling density. Booth and Hobert (1999) argue that in general, a multivariate t density is a better choice as an importance sampling density than a normal density. One reason is that the former is heavy-tailed while the latter is not, and that a good importance sampling density should be heavy-tailed when the density it approximates may be so; also, the t density is easy to sample from. The degrees of freedom of the t density is taken to be four because low degrees of freedom of a t density ensures that it has a heavy tail. (The choice of optimal degrees of freedom is an issue requiring further investigation.) The mean and variance of the t importance sampling density are taken as input from a nonstochastic version of MGROUP. In our implementation, the YGROUP version (von Davier & Yu, 2003) is used, which conveniently provides input for the importance sampling density in the stochastic EM algorithm. It is also possible to estimate variances of the parameter estimates obtained from the EM algorithm as suggested by, for example, Booth and Hobert (1999). However, we are 13

20 more interested in the estimation of the latent regression parameters in this work and do not delve into the issue here. Note that a number of studies have successfully applied other forms of stochastic EM methods, such as rejection sampling EM (Booth & Hobert, 1999) and Markov chain Monte Carlo EM, or MCMC-EM, (e.g., McCulloch, 1994), but we do not explore those approaches, mainly because of the difficulties in their application to our problem. 4 MCEM in MGROUP This section describes the implementation of (MCEM is short for Markov chain Monte Carlo EM), the version of MGROUP that uses importance sampling to generate the necessary statistics in the E-Step. Because of the presence of a stochastic component in the E-step, this technique is also called stochastic EM algorithm. Following the notes on the implementation, differences between assessing convergence in nonstochastic optimization and in stochastic optimization algorithms will be discussed. 4.1 Implementing The MCEM algorithm was integrated into YGROUP (von Davier, 2003), which is based on BGROUP and implements SUR (Zellner, 1962) and various other modifications. Following Booth and Hobert (1999), importance sampling is used in the E-step to generate the statistics (here, posterior means, residual components, and measurement error components) necessary to perform the EM iterations for the latent regression. The implementation uses importance sampling, employing a multivariate t (as in Booth & Hobert, 1999). The version as implemented in YGROUP uses the SUR posterior means and the posterior residual (co-)variance matrix as fast and efficient means to compute the mean and variance of the importance sampling density. Like in all versions of MGROUP, optional starting values for the latent regression (Γ, Σ) can be provided but are not essential. As an alternative, initial iterations with a nonstochastic E-step of one of the other versions can be carried out to generate starting values. The initial sample size of the importance sample can be chosen to be much smaller 14

21 than the final sample size, in order to speed up initial iterations where individual accuracy is less important than approaching a good starting value for the aggregate parameters. The examples below use 500 as initial importance sample size and 6,000 as final importance sample size. 4.2 Determining Convergence of the MCEM Algorithm When applying a stochastic EM algorithm, it is possible that the EM algorithm has not converged, but appears so after an EM step (i.e., updated parameters and likelihoods do not change much from the previous step) because of an unlucky random sample. One should therefore use a stricter convergence criterion, or a larger variety of criteria that all have to be met simultaneously, in order to ensure the convergence of the algorithm. We monitor the likelihood function and the parameter vector for convergence and conclude that convergence has occurred only when the relative change in both these quantities are less than ɛ in five successive iterations. The likelihood increases over the iterations in a nonstochastic EM algorithm (see, e.g., Dempster et al., 1977) while it may not be so for a stochastic EM algorithm because of Monte Carlo error. In order to assess convergence in stochastic optimization algorithms, the inevitable nonmonotonicity of these algorithms has to be taken into account. Each E-step in our implementation employs a different set of points (the importance sample) in the multivariate quadrature grid in order to perform the integration. This means that the likelihood of each observed response vector may vary, and hence, the likelihood of the whole sample may vary even if all other parameters are unchanged, from one importance sample to the next. Therefore, in our implementation, we monitor absolute changes in the log likelihood, the regression parameters Γ, and the residual (co-)variance matrix Σ simultaneously. The maximum absolute changes are checked against user-defined stopping criteria, which are used to decide whether a significant change has occurred that demands further iterations towards the optimum. The algorithm stops if, and only if, all three absolute changes are smaller than user-defined numbers. In addition to that, the absolute changes of previous 15

22 iterations are integrated with the current change by using the following approach: For cycle t, define the average absolute maximum change (AAMC) as AAMC x,t = p AMC x,t + (1 p) AAMC x,t 1, which averages the current absolute maximum change (AM C) and the previous average absolute maximum change (AAM C). The x denotes a parameter (vector) or a function of parameters (for example, the maximum change in regression parameters in our case). t denotes the current iteration (cycle) of the algorithm; t 1, the previous cycle; and 0 (1 p) 1.0, the weight of the averaged previous cycles criterion. This criterion ensures that, if p < 1, more than the current change (which might be small due to the stochastic nature of the algorithm) is taken into account when deciding whether to stop iterations. If p = 1, we have a regular (no memory) stopping criterion, which stops whenever the current iteration alone meets the stopping rule. In the current implementation, the stopping rule fades out the past absolute changes rather slowly. We found that p = 0.4 and the AAMC bound of for relative change in likelihood, regression weights, and variances works reasonably well. This choice balances infinite iterations against premature termination of the algorithm in the examples presented here. The rule for stopping the algorithm is that AAMC x,t < has to be satisfied for termination, a maximum change bound similar to what Booth and Hobert (1999) report. 5 Analysis of Simulated Data This section presents the first proof-of-concept example. The simulated data set used for this example matches the structure and size of the 2000 NAEP mathematics assessment at grade 4 and comes from a previous study (von Davier & Yu, 2003). The assessment has five scales: (a) Number and Operations, (b) Measurements, (c) Geometry, (d) Data Analysis, and (e) Algebra. This section applies the stochastic EM method to find the parameter estimates for the simulated data set, which involves responses of 13,511 students. Each student responds to 1 of 26 sets of items (booklets). Each booklet contains three blocks of items (both multiple-choice and constructed-response) out of a total 145 items. Each examinee has information on 381 predictors that are used in the latent 16

23 regression model. This section compares estimates from the and the operational of the posterior means and standard deviations of examinees, the regression parameters Γ, and the residual variance-covariance matrix Σ. As an additional benchmark, both the results of and are compared against the generating values, that is, the individual latent trait vectors used to simulate the data. 5.1 Estimates of Γ and Σ for the Simulated Data Figure 1 shows a comparison of the and regression parameters estimates (i.e., estimates of the individual parameters in Γ) for the 2000 NAEP mathematics assessment at grade 4 by subscale. The figure points to the extreme closeness of the estimates from the two approaches. Table 1 shows the residual variance-covariance estimates (i.e., estimates of the individual parameters in Σ) from and for the simulated data set. The table shows slightly larger estimated variances for the MCEM version of MGROUP as compared to. The pattern is the opposite for the estimated covariances. The differences between the covariance estimates of and seem smaller than the differences between the variance estimates of the two approaches. 5.2 Subgroup Estimates for the 2000 NAEP Mathematics Assessment at Grade 4 Figures 2 to 4 present recovery plots for and. The figures also show subplots comparing the individual means and standard deviations from and. Each subscale has four subplots. The recovery plots are based on individual examinee statistics, and hence will show more variance than that in the population parameters Γ and Σ. In the recovery plots, the generating values are the reference, and the conditional posterior moments (given the responses, the background information, and the maximum likelihood estimates of Γ and Σ, not the random draws 17

24 All Gamma NUM&OP All Gamma MEASUREMENT All Gamma GEOMETRY All Gamma DATA ANALYSIS All Gamma ALGEBRA Figure 1. Regression parameter (Γ) estimates from and for the data for 2000 NAEP mathematics assessment at grade 4. of these parameters used when generating imputations with MGROUP) as generated by and are plotted against these reference values. For all subscale displays, the ideal recovery or agreement line, respectively, is indicated by the main diagonal given in each plot. Figures 2 to 4 show no obvious differences between and when comparing the plots showing recovery of the generating θ values. This observation finds additional support when examining the plot of both posterior mean estimates. The posterior means generated by and fall nicely around the diagonal line and the plot is much narrower than the truth recovery plots. This indicates that and agree quite well when producing posterior means, a statement that cannot be said for the posterior standard deviation. 18

25 Num&Op Posterior mean Num&Op Posterior mean True Theta True Theta Num&Op Posterior mean Num&Op Posterior SD Measurement Posterior mean Measurement Posterior mean True Theta True Theta Measurement Posterior mean Measurement Posterior SD Figure 2. Posterior moments from and compared against the generating values and the posterior means and standard deviations from and compared against each other. 19

26 Geometry Posterior mean Geometry Posterior mean True Theta Geometry Posterior mean True Theta Geometry Posterior SD Data Analysis Posterior mean Data Analysis Posterior mean True Theta Data Analysis Posterior mean True Theta Data Analysis Posterior SD Figure 3. Posterior moments from and compared against the generating values and the posterior means and standard deviations from and compared against each other. 20

27 Table 1. Residual Variance Covariance Estimates From and NUM&OPER MEASURMT GEOMETRY DATA ANL ALGEBRA estimates NUM&OPER MEASURMT GEOMETRY DATA ANL ALGEBRA estimates NUM&OPER MEASURMT GEOMETRY DATA ANL ALGEBRA Note. Based on the operational population model for the 2000 NAEP mathematics assessment at grade 4. The posterior standard deviations given by are considerably larger than for for all subscales. It is to be determined whether this is a systematic difference, whether it is due to Monte Carlo inflation of the variance, or whether this depends on some other structural variable such as the correlation between scales or the sample size. Table 5.2 compares the subgroup means from and for relevant subgroups there seems to be little difference between the two methods from this aspect as well. Table 5.2 results show that does not perform much differently from when reproducing group level statistics such as subgroup means and variances. 21

28 Algebra Posterior mean Algebra Posterior mean True Theta True Theta Algebra Posterior mean Algebra Posterior SD Figure 4. Posterior moments from and compared against the generating values and the posterior means and standard deviations from and compared against each other. 6 Analysis of Real NAEP Data: 2003 Reading Assessment at Grade 4 The 2003 NAEP reading assessment at Grade 4 has data from 187,581 examinees in the fourth grade or 9 years of age. Each student answers a literary block (to assess the ability to read for literary experience) and an informative block (to assess the ability to read for information). Each block consists of 10 to 12 items, with a mixture of multiple-choice and constructed-response items. We will refer to the two skills (subscales) measured by the assessment as literary and information. The combined item sample for these two scales has 111 items in total; 688 background variables are used in the latent regression model. Because the θ i s are two-dimensional, it is possible to apply BGROUP (which performs exact numerical integration in the E-step) here. BGROUP provides the gold standard, and we will compare the results obtained from the application of the stochastic EM against it. Comparison with results from will be provided as well. 22

29 Table 2. Comparison of Subgroup Estimates From and The subgroup NumOp Meas Geom DA Alg NumOp Meas Geom DA Alg Overall Male Female White Black Hispanic Asian Amerind Estimates of Γ and Σ for the Real Data Table 3 shows the residual variance estimates ˆΣ as generated by BGROUP,, and for the NAEP data. Table 3. Residual Variances, Covariances, and Correlations for the 2003 NAEP Reading Assessment at Grade 4 Data BGROUP Lit. Info. Lit. Info. Lit. Info. Lit Info Note. Lit. = Literary, Info. = Information, Residual variances = main diagonals, covariances = upper off-diagonals,and correlations = lower off-diagonals. The three estimates for the residual correlation and the residual variances are close. Interestingly, the estimates generated by are slightly smaller than 23

30 estimates for the reading data, whereas the residual variances estimates for the simulated math were larger for than for. As an outcome, the correlation is higher for than for. Whether this is an effect of the sample size or the number of scales needs further investigation. Figure 5 shows the regression parameter estimates as generated by and plotted against the estimates generated by the gold standard for this data example, BGROUP All Gamma LITERARY BGROUP All Gamma INFORMATION BGROUP All Gamma LITERARY All Gamma INFORMATION BGROUP BGROUP Figure 5. Regression coefficients for the 2003 NAEP reading assessment at grade 4. The figure shows that the 688 regression coefficient estimates each for the literary and the information scales agree to a large extent between the different versions of MGROUP. Almost all points are found on the main diagonal, which indicates that the ˆΓ estimates produced by BGROUP,, and are virtually identical. 6.2 Subgroup Estimates for the 2003 NAEP Reading Assessment at Grade 4 Figure 6.2 presents plots of the posterior means as generated by and compared to the posterior means generated by BGROUP. Figure 7 24

31 LITERARY Posterior mean LITERARY Posterior mean LITERARY Posterior mean BGROUP BGROUP Information Posterior mean Information Posterior mean Information Posterior mean BGROUP BGROUP Figure 6. Posterior means as generated by and and plotted against the reference, BGROUP, for the NAEP 2003 grade 4 reading data. compares the corresponding posterior standard deviations. In contrast to the first example (that involving simulated data), there is no obvious large difference between the estimates of the posterior means or standard deviations (SDs) generated by and. Both approaches differ slightly from the estimates generated by BGROUP. overestimates the posterior standard deviation for larger values when compared to BGROUP. The graph comparing BGROUP and posterior SDs shows this effect in a very similar way to what Thomas (1993) reports in his Figure 1., on the other hand, stays closer to BGROUP for large posterior standard deviations than does. For small standard deviations, seems to produce slightly larger values than BGROUP, while the estimates generated by and BGROUP agree better with smaller standard deviations. Except for a few outliers, the posterior means as generated by agree nicely with the values produced by both BGROUP and. There is no obvious 25

32 LITERARY Posterior SD LITERARY Posterior SD LITERARY Posterior SD BGROUP BGROUP Information Posterior SD Information Posterior SD Information Posterior SD BGROUP BGROUP Figure 7. Posterior standard deviations as generated by and MCEM- GROUP and plotted against the reference, BGROUP, for the 2003 NAEP reading assessment at grade 4 data. indication of systematic over- or underestimation for the vast majority of subjects. The obvious outliers seen in these plots are subject to ongoing research aimed at optimizing and stabilizing the current implementation of. It will be investigated whether the stochastic integration failed to produce meaningful results due to poor coverage of the true posterior by the importance sampling distribution or whether these values are caused by small size of the importance sample, or for other less obvious reasons. Table 4 shows the subgroup estimates and the corresponding standard deviations (in parenthesis) provided by the three methods. There is hardly any differences between and so far as subgroup estimates are concerned. However, when compared to BGROUP results, both and slightly underestimate the subgroup means. 26

33 Table 4. Comparison of Subgroup Estimates From BGROUP,, and Pop. BGROUP sub. Lit. Info. Lit. Info. Lit. Info. Overall (0.95) (0.98) (0.96) (0.99) (0.94) (0.98) Male (0.96) (1.00) (0.96) (1.01) (0.94) (0.99) Female (0.94) (0.97) (0.94) (0.98) (0.93) (0.96) White (0.87) (0.89) (0.87) (0.90) (0.86) (0.89) Black (0.92) (0.92) (0.92) (0.93) (0.91) (0.91) Hisp (0.94) (0.94) (0.94) (0.95) (0.93) (0.93) Asian (0.93) (0.97) (0.94) (0.98) (0.92) (0.96) AmInd (0.94) (0.94) (0.95) (0.95) (0.92) (0.93) Public (0.96) (0.99) (0.96) (1.00) (0.94) (0.98) Private (0.83) (0.87) (0.83) (0.88) (0.82) (0.86) Note. Pop. sub. = Population subgroup, Lit. = Literary, Info. = Information. Hisp. = Hispanic. Asian = Asian Pacific, AmInd = American Indian. 7 Conclusions and Future Work is the current operational method used in large scale assessments such as NAEP. Though provides more accurate results than its predecessor (NGROUP), it is not without problems, as demonstrated by Thomas (1993); especially, is found to inflate variance estimates for examinees with large posterior variances. As of now, there is no entirely satisfactory alternative to. As this work shows, a stochastic EM method using importance sampling provides a viable alternative to. Application of the importance sampling EM method to a simulated data set and a real data set reveals interesting facts. The regression parameter estimates are extremely close for BGROUP (the gold standard), (the current operational version of MGROUP used in NAEP, etc.), and the newly proposed. The estimates of conditional posterior means seem 27

34 to be quite close for and. In the real data example, both of these set of estimates are very close to the BGROUP estimates. The MCEM algorithm produced a few outlying estimated conditional posterior means, especially in the real data example, that may be due to MC error; this issue needs further investigations. Statements about agreement between posterior standard deviations cannot be clearly made as of now. For the real data example, which has a large sample size, provides inflated estimates of posterior SDs for examinees with large SDs, while the does not. However, results in a few outliers in the middle of the range of the SDs. Interestingly, the pattern is quite different for the simulated data, where the estimates of posterior SDs are bigger than those from. Overall, the stochastic EM method performs slightly better than the version of MGROUP. One problem with the stochastic EM method is that it is very time-consuming. For example, for our real data example (2003 reading data), while the BGROUP takes 8 hours and takes 4 hours on a Pentium IV computer with 2.2 Ghz, takes about 48 hours, which is longer than by a factor of 12. However, with the advent of increasingly fast computers, application of will become more feasible; at least this method can be used to get a second opinion right now. It is possible to improve the efficiency of the importance sampling EM method even further. Let us consider estimation of E(θ X, Y, Γ t, Σ t ), which is given in (15). Let us denote θ 0 to be the mode of p(θ X, Y, Γ t, Σ t ). We have, E(θ X, Y, Γ t, Σ t ) θp(θ X, Y, Γ t, Σ t )dθ θexp{u(θ)}dθ, for exp{u(θ)} p(θ X, Y, Γ t, Σ t ) θexp{u(θ 0 ) (θ θ 0) u (θ 0 )(θ θ 0 ) + (θ)}dθ as u (θ 0 ) 0 { } e u(θ 0) 1 θexp 2 (θ θ 0) u (θ 0 )(θ θ 0 ) exp( (θ))dθ where (θ) = u(θ) u(θ 0 ) 1 2 (θ θ 0) u (θ 0 )(θ θ 0 ) The first term under exponentiation in the last step above forms a normal distribution and (θ) is much less variable (and close to zero) over the range of θ than is u(θ). Therefore, using the above normal distribution as the importance sampling density 28

Extension of the NAEP BGROUP Program to Higher Dimensions

Extension of the NAEP BGROUP Program to Higher Dimensions Research Report Extension of the NAEP Program to Higher Dimensions Sandip Sinharay Matthias von Davier Research & Development December 2005 RR-05-27 Extension of the NAEP Program to Higher Dimensions Sandip

More information

Stochastic Approximation Methods for Latent Regression Item Response Models

Stochastic Approximation Methods for Latent Regression Item Response Models Research Report Stochastic Approximation Methods for Latent Regression Item Response Models Matthias von Davier Sandip Sinharay March 2009 ETS RR-09-09 Listening. Learning. Leading. Stochastic Approximation

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Assessment of fit of item response theory models used in large scale educational survey assessments

Assessment of fit of item response theory models used in large scale educational survey assessments DOI 10.1186/s40536-016-0025-3 RESEARCH Open Access Assessment of fit of item response theory models used in large scale educational survey assessments Peter W. van Rijn 1, Sandip Sinharay 2*, Shelby J.

More information

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

A multivariate multilevel model for the analysis of TIMMS & PIRLS data A multivariate multilevel model for the analysis of TIMMS & PIRLS data European Congress of Methodology July 23-25, 2014 - Utrecht Leonardo Grilli 1, Fulvia Pennoni 2, Carla Rampichini 1, Isabella Romeo

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Chapter 11. Scaling the PIRLS 2006 Reading Assessment Data Overview

Chapter 11. Scaling the PIRLS 2006 Reading Assessment Data Overview Chapter 11 Scaling the PIRLS 2006 Reading Assessment Data Pierre Foy, Joseph Galia, and Isaac Li 11.1 Overview PIRLS 2006 had ambitious goals for broad coverage of the reading purposes and processes as

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Second meeting of the FIRB 2012 project Mixture and latent variable models for causal-inference and analysis

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Equating Subscores Using Total Scaled Scores as an Anchor

Equating Subscores Using Total Scaled Scores as an Anchor Research Report ETS RR 11-07 Equating Subscores Using Total Scaled Scores as an Anchor Gautam Puhan Longjuan Liang March 2011 Equating Subscores Using Total Scaled Scores as an Anchor Gautam Puhan and

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Model comparison. Christopher A. Sims Princeton University October 18, 2016

Model comparison. Christopher A. Sims Princeton University October 18, 2016 ECO 513 Fall 2008 Model comparison Christopher A. Sims Princeton University sims@princeton.edu October 18, 2016 c 2016 by Christopher A. Sims. This document may be reproduced for educational and research

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions R U T C O R R E S E A R C H R E P O R T Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions Douglas H. Jones a Mikhail Nediak b RRR 7-2, February, 2! " ##$%#&

More information

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions Cees A.W. Glas Oksana B. Korobko University of Twente, the Netherlands OMD Progress Report 07-01. Cees A.W.

More information

A Note on the Choice of an Anchor Test in Equating

A Note on the Choice of an Anchor Test in Equating Research Report ETS RR 12-14 A Note on the Choice of an Anchor Test in Equating Sandip Sinharay Shelby Haberman Paul Holland Charles Lewis September 2012 ETS Research Report Series EIGNOR EXECUTIVE EDITOR

More information

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl

More information

Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data

Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Research Report Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Gautam Puhan February 2 ETS RR--6 Listening. Learning. Leading. Chained Versus Post-Stratification

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Introduction to Bayesian Computation

Introduction to Bayesian Computation Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - Iowa State University March 20, 2018 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 1 / 30 Bayesian computation

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring

Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring Journal of Educational and Behavioral Statistics Fall 2005, Vol. 30, No. 3, pp. 295 311 Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013 Eco517 Fall 2013 C. Sims MCMC October 8, 2013 c 2013 by Christopher A. Sims. This document may be reproduced for educational and research purposes, so long as the copies contain this notice and are retained

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

Reminder of some Markov Chain properties:

Reminder of some Markov Chain properties: Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent

More information

Heriot-Watt University

Heriot-Watt University Heriot-Watt University Heriot-Watt University Research Gateway Prediction of settlement delay in critical illness insurance claims by using the generalized beta of the second kind distribution Dodd, Erengul;

More information

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Statistical Methods. Missing Data  snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23 1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing

More information

Use of e-rater in Scoring of the TOEFL ibt Writing Test

Use of e-rater in Scoring of the TOEFL ibt Writing Test Research Report ETS RR 11-25 Use of e-rater in Scoring of the TOEFL ibt Writing Test Shelby J. Haberman June 2011 Use of e-rater in Scoring of the TOEFL ibt Writing Test Shelby J. Haberman ETS, Princeton,

More information

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of

More information

Scaling Methodology and Procedures for the TIMSS Mathematics and Science Scales

Scaling Methodology and Procedures for the TIMSS Mathematics and Science Scales Scaling Methodology and Procedures for the TIMSS Mathematics and Science Scales Kentaro Yamamoto Edward Kulick 14 Scaling Methodology and Procedures for the TIMSS Mathematics and Science Scales Kentaro

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY ECO 513 Fall 2008 MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY SIMS@PRINCETON.EDU 1. MODEL COMPARISON AS ESTIMATING A DISCRETE PARAMETER Data Y, models 1 and 2, parameter vectors θ 1, θ 2.

More information

Direct Simulation Methods #2

Direct Simulation Methods #2 Direct Simulation Methods #2 Econ 690 Purdue University Outline 1 A Generalized Rejection Sampling Algorithm 2 The Weighted Bootstrap 3 Importance Sampling Rejection Sampling Algorithm #2 Suppose we wish

More information

Estimating the parameters of hidden binomial trials by the EM algorithm

Estimating the parameters of hidden binomial trials by the EM algorithm Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Bayesian Nonparametric Rasch Modeling: Methods and Software

Bayesian Nonparametric Rasch Modeling: Methods and Software Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Choice of Anchor Test in Equating

Choice of Anchor Test in Equating Research Report Choice of Anchor Test in Equating Sandip Sinharay Paul Holland Research & Development November 2006 RR-06-35 Choice of Anchor Test in Equating Sandip Sinharay and Paul Holland ETS, Princeton,

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Hierarchical Cognitive Diagnostic Analysis: Simulation Study

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Hierarchical Cognitive Diagnostic Analysis: Simulation Study Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 38 Hierarchical Cognitive Diagnostic Analysis: Simulation Study Yu-Lan Su, Won-Chan Lee, & Kyong Mi Choi Dec 2013

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

Basic IRT Concepts, Models, and Assumptions

Basic IRT Concepts, Models, and Assumptions Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction

More information

Bayesian Analysis of Latent Variable Models using Mplus

Bayesian Analysis of Latent Variable Models using Mplus Bayesian Analysis of Latent Variable Models using Mplus Tihomir Asparouhov and Bengt Muthén Version 2 June 29, 2010 1 1 Introduction In this paper we describe some of the modeling possibilities that are

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Equating of Subscores and Weighted Averages Under the NEAT Design

Equating of Subscores and Weighted Averages Under the NEAT Design Research Report ETS RR 11-01 Equating of Subscores and Weighted Averages Under the NEAT Design Sandip Sinharay Shelby Haberman January 2011 Equating of Subscores and Weighted Averages Under the NEAT Design

More information

Variational Bayesian Inference for Parametric and Non-Parametric Regression with Missing Predictor Data

Variational Bayesian Inference for Parametric and Non-Parametric Regression with Missing Predictor Data for Parametric and Non-Parametric Regression with Missing Predictor Data August 23, 2010 Introduction Bayesian inference For parametric regression: long history (e.g. Box and Tiao, 1973; Gelman, Carlin,

More information

Bayesian Mixture Modeling

Bayesian Mixture Modeling University of California, Merced July 21, 2014 Mplus Users Meeting, Utrecht Organization of the Talk Organization s modeling estimation framework Motivating examples duce the basic LCA model Illustrated

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Stat 451 Lecture Notes Monte Carlo Integration

Stat 451 Lecture Notes Monte Carlo Integration Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:

More information

LINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006

LINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006 LINKING IN DEVELOPMENTAL SCALES Michelle M. Langer A thesis submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Master

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Statistical and psychometric methods for measurement: G Theory, DIF, & Linking

Statistical and psychometric methods for measurement: G Theory, DIF, & Linking Statistical and psychometric methods for measurement: G Theory, DIF, & Linking Andrew Ho, Harvard Graduate School of Education The World Bank, Psychometrics Mini Course 2 Washington, DC. June 27, 2018

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

2 Bayesian Hierarchical Response Modeling

2 Bayesian Hierarchical Response Modeling 2 Bayesian Hierarchical Response Modeling In the first chapter, an introduction to Bayesian item response modeling was given. The Bayesian methodology requires careful specification of priors since item

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Investigation into the use of confidence indicators with calibration

Investigation into the use of confidence indicators with calibration WORKSHOP ON FRONTIERS IN BENCHMARKING TECHNIQUES AND THEIR APPLICATION TO OFFICIAL STATISTICS 7 8 APRIL 2005 Investigation into the use of confidence indicators with calibration Gerard Keogh and Dave Jennings

More information