Quasi-Monte Carlo Sampling to improve the Efficiency of Monte Carlo EM

Size: px
Start display at page:

Download "Quasi-Monte Carlo Sampling to improve the Efficiency of Monte Carlo EM"

Transcription

1 Quasi-Monte Carlo Sampling to improve the Efficiency of Monte Carlo EM Wolfgang Jank Department of Decision and Information Technologies University of Maryland College Park, MD November 17, 2003 Abstract In this paper we investigate an efficient implementation of the Monte Carlo EM algorithm based on Quasi-Monte Carlo sampling. The Monte Carlo EM algorithm is a stochastic version of the deterministic EM (Expectation-Maximization) algorithm in which an intractable E-step is replaced by a Monte Carlo approximation. Quasi-Monte Carlo methods produce deterministic sequences of points that can significantly improve the accuracy of Monte Carlo approximations over purely random sampling. One drawback to deterministic Quasi-Monte Carlo methods is that it is generally difficult to determine the magnitude of the approximation error. However, in order to implement the Monte Carlo EM algorithm in an automated way, the ability to measure this error is fundamental. Recent developments of randomized Quasi-Monte Carlo methods can overcome this drawback. We investigate the implementation of an automated, datadriven Monte Carlo EM algorithm based on randomized Quasi-Monte Carlo methods. We apply this algorithm to a geostatistical model of online purchases and find that it can significantly decrease the total simulation effort, thus showing great potential for improving upon the efficiency of the classical Monte Carlo EM algorithm. Key words and phrases: Monte Carlo error; low-discrepancy sequence; Halton sequence; EM algorithm; geostatistical model. 1

2 1 Introduction The Expectation-Maximization (EM) algorithm (Dempster et al., 1977) is a popular tool in statistics and many other fields. One limitation to the use of EM is, however, that quite often the E-step of the algorithm involves an analytically intractable, sometimes high dimensional integral. Hobert (2000), for example, considers a model for which the E-step involves intractable integrals of dimension twenty. The Monte Carlo EM (MCEM) algorithm, proposed by Wei & Tanner (1990), estimates this intractable integral with an empirical average based on simulated data. Typically, the simulated data is obtained by producing random draws from the distribution commanded by EM. By the law of large numbers, this integral-estimate can be made arbitrarily accurate by increasing the size of the simulated data. The MCEM algorithm typically requires a very high accuracy, especially at the later iterations. Booth & Hobert (1999), for example, report sample sizes of over 66,000 at convergence. This suggests that the overall efficiency of MCEM could be improved by using simulation methods that achieve a high accuracy in the integral-estimate with smaller sample sizes. Recent research has provided evidence that entirely random draws do not necessarily result in the most efficient use of the simulated data. In particular, one criticism of random draws is that they often do not explore the sample space well (Morokoff & Caflisch, 1995; Caflisch et al., 1997). For instance, points drawn at random tend to form clusters which leads to gaps where the sample space is not explored at all (see Figure 1 for illustration). This criticism has lead to the development of a variety of deterministic methods that provide for a better spread of the sample points. These deterministic methods are often classified as Quasi-Monte Carlo (QMC) methods. Theoretical as well as empirical research has shown that QMC methods can significantly increase the accuracy of the integral-estimate over random draws. Figure 1 about here In this paper we investigate an implementation of the MCEM algorithm based on QMC methods. Wei & Tanner (1990) point out that for an efficient implementation, the size of the simulated data should be chosen small at the initial stage but increased successively as the algorithm moves along. Early versions of the method require a manual, user-determined increase of the sample size, for instance, by allocating the amount of data to be simulated in each iteration already before the start 2

3 of the algorithm (e.g. McCulloch, 1997). Implementations of MCEM that determine the necessary sample size in an automated, data-driven fashion have been developed only recently (see Booth & Hobert, 1999; Levine & Casella, 2001; Levine & Fan, 2003). Automated implementations of MCEM base the decision to increase the sample size on the magnitude of the error in the integralapproximation. In their seminal work, Booth & Hobert (1999) use statistical methods to estimate this error when the simulated data is generated at random. However, since QMC methods are deterministic in nature, statistical methods do not apply. Moreover, determining the error of the QMC integral-estimate analytically can be extremely hard (Caflisch et al., 1997). Recently, the development of randomized QMC methods has overcome this early drawback. Randomized Quasi-Monte Carlo (RQMC) methods combine the benefits of deterministic sampling methods, which achieve a more uniform exploration of the sample space, with the statistical advantages of random draws. A survey of recent advances in RQMC methods can be found in L Ecuyer & Lemieux (2002). In this work we implement an automated MCEM algorithm based on RQMC methods. Specifically, we demonstrate how to obtain a QMC sample from the distribution commanded by EM and we use the ideas of RQMC sampling to measure the error of the integral-estimate in every iteration of the algorithm. We implement this Quasi-Monte Carlo EM (QMCEM) algorithm within the framework of the automated MCEM formulation proposed by Booth & Hobert (1999). The remainder of this paper is organized as follows. In Section 2 we briefly motivate the ideas surrounding QMC and RQMC. In Section 3 we explain how RQMC methods can be used to implement QMCEM in an automated, data-driven fashion. We apply this algorithm to a geostatistical model of online purchases in Section 4 and conclude with final remarks in Section 5. 2 Quasi-Monte Carlo Sampling Quasi-Monte Carlo methods can be regarded as a deterministic counterpart to classical Monte Carlo. Suppose we want to evaluate an (analytically intractable) integral I = f(x)dx (1) C d 3

4 over the d-dimensional unit cube, C d := [0, 1] d. Classical Monte Carlo integration randomly selects points x k Uniform(C d ), k = 1,..., m, and approximates (1) by the empirical average Ĩ = 1 m f(x k ). (2) m k=1 Quasi-Monte Carlo methods, on the other hand, select the points deterministically. Specifically, QMC methods produce a deterministic sequence of points that provides the best-possible spread in C d. These deterministic sequences are often referred to as low-discrepancy sequences (see, for example, Niederreiter, 1992; Fang & Wang, 1994). A variety of different low-discrepancy sequences exist. Examples include the Halton sequence (Halton, 1960), the Sobol sequence (Sobol, 1967), the Faure sequence (Faure, 1982), and the Niederreiter sequence (Niederreiter, 1992), but this list is not exhaustive. In this work we focus our attention on the Halton sequence since it is conceptually very appealing. 2.1 Halton Sequences Let b be a prime number. Then any integer k, k 0, can be written in base-b representation as k = d j b j + d j 1 b j d 1 b + d 0, where d i {0, 1,..., b 1} for i = 0, 1,..., j. Define the base-b radical inverse function, φ b (k), as φ b (k) = d 0 b 1 + d 1 b d j b j+1. Notice that for every integer k 0, φ b (k) [0, 1]. The kth element of the Halton sequence is obtained via the radical inverse function evaluated at k. Specifically, if b 1,..., b d are d different prime numbers, then a d-dimensional Halton sequence of length m is given by {x 1,..., x m }, where the kth element of the sequence is x k = [φ b1 (k 1),..., φ bd (k 1)] T, k = 1,..., m. (3) (See Halton (1960) or Wang & Hickernell (2000) for more details.) Notice that the Halton sequence does not need to be started at the origin. Indeed, for any d- vector of non-negative integers, n = (n 1,..., n d ) T, say, the Halton sequence with the first elements skipped, x k = [φ b1 (n 1 + k 1),..., φ bd (n d + k 1)] T, k = 1,..., m, (4) 4

5 remains a low-discrepancy sequence (see Pagès, 1992; Bouleau & Lépingle, 1994). We will refer to the sequence defined by (4) as a Halton sequence with starting point n. Figure 1 shows the first 2500 elements of a two-dimensional Haltion sequence with n = (0, 0) T. 2.2 Randomized Quasi-Monte Carlo Owen (1998b) points out that the main (practical) disadvantage of QMC is that determining the accuracy of the integral-estimate in (2) is typically very complicated, if not impossible. Moreover, since QMC methods are based on deterministic sequences, statistical procedures for error estimation do not apply. This drawback has lead to the development of randomized Quasi-Monte Carlo (RQMC) methods. L Ecuyer & Lemieux (2002) suggest that any RQMC sequence should have the following two properties: 1) every element of the sequence has a uniform distribution over C d ; 2) the lowdiscrepancy property of the sequence is preserved under the randomization. The first property guarantees that the approximation Ĩ in (2) is an unbiased estimate of the integral in (1). Moreover, one can estimate its variance by generating r independent copies of Ĩ (which is typically done by generating r independent sequences x (j) 1,..., x(j) m, j = 1,..., r). Given a desired total simulation amount N = rm, smaller values of r (paired with a larger value of m) should result in a better accuracy of the integral-estimate, since it takes better advantage of the low-discrepancy property of each sequence. At the extreme, taking r = N and m = 1 simply reproduces classical Monte Carlo estimation. 2.3 Randomized Halton Sequences Recall that, regardless of the starting point, the Halton sequence remains a low-discrepancy sequence. Wang & Hickernell (2000) use this fact to show that if the Halton sequence is started at a random point, x 1 Uniform(C d ), then it satisfies the RQMC properties 1) and 2) from Subsection 2.2. In the following sections, we will use RQMC sampling based on the randomized Halton sequence. 5

6 3 Quasi-Monte Carlo EM The Expectation-Maximization (EM) algorithm (Dempster et al., 1977) is an iterative procedure useful to approximate the maximum likelihood estimator (MLE) in incomplete data problems. Let y be a vector of observed data, let u be a vector of unobserved data or random effects and let θ denote a vector of parameters. Furthermore, let f(y, u; θ) denote the joint density of the complete data, (y, u). Let L(θ; y) = f(y, u; θ)du denote the (marginal) likelihood function for this model. The MLE, ˆθ, maximizes L( ; y). In each iteration, the EM algorithm performs an expectation and a maximization step. Let θ (t 1) denote the current parameter value. Then, in the tth iteration of the algorithm, the E- step computes the conditional expectation of the complete data log-likelihood, conditional on the observed data and the current parameter value, [ Q(θ θ (t 1) ) = E log f(y, u; θ) y; θ (t 1)]. (5) The tth EM update, θ (t), maximizes (5). That is θ (t) satisfies Q(θ (t) θ (t 1) ) Q(θ θ (t 1) ) (6) for all θ in the parameter space. This is also known as the M-step. The M-step is often implemented using standard numerical methods like Newton-Raphson (see Lange, 1995). Solutions to overcome a difficult M-step have been proposed in, for example, Meng & Rubin (1993). Given an initial value θ (0), the EM algorithm produces a sequence {θ (0), θ (1), θ (2),... } that, under regularity conditions (see Boyles, 1983; Wu, 1983), converges to ˆθ. In this work we focus on the situation when the E-step does not have a closed form solution. Wei & Tanner (1990) proposed to approximate an analytically intractable expectation in (5) by the empirical average Q(θ θ (t 1) ) Q(θ θ (t 1) ; u 1,..., u mt ) = 1 m t log f(y, u k ; θ), (7) m t where u 1,..., u mt are simulated from the conditional distribution f(u y; θ (t 1) ). Then, by the law of large numbers, Q(θ θ (t 1) ) will be a reasonable approximation to Q(θ θ (t 1) ) if m t is large enough. k=1 6

7 We consider a modification of (7) suitable for RQMC sampling. Let u (j) 1,..., u(j) m t, j = 1,..., r, be r independent RQMC sequences of length m t, each simulated from f(u y; θ (t 1) ). (The details of how to simulate a RQMC sequence from f(u y; θ (t 1) ) are deferred until Subsection 3.2.) Then, an unbiased estimate of (5) is given by the pooled estimate Q P (θ θ (t 1) ) = 1 r Q (j) (θ θ (t 1) ), (8) r j=1 where Q (j) (θ θ (t 1) ) = Q(θ θ (t 1) ; u (j) 1,..., u(j) m t ) in (7). The tth Quasi-Monte Carlo EM (QMCEM) update, θ (t), maximizes Q P ( θ (t 1) ). 3.1 Increasing the length of the RQMC sequences We have pointed out earlier that the Monte Carlo sample sizes m t should be increased successively as the algorithm moves along. In fact, Booth et al. (2001) argue that MCEM will never converge if m t is held fixed across iterations because of a persevering Monte Carlo error (see also Chan & Ledolter, 1995). While earlier versions of the method choose the Monte Carlo sample sizes in a deterministic fashion before the start of the algorithm (e.g. McCulloch, 1997), the same deterministic allocation of Monte Carlo resources that works well in one problem may result in a very inefficient (or inaccurate) algorithm in another problem. Thus, data-dependent (and user-independent) sample size rules are necessary in order to implement MCEM in an automated way. Booth & Hobert (1999) base the decision of a sample size increase on the noise in the parameter updates (see also Levine & Casella, 2001; Levine & Fan, 2003). Let θ (t 1) denote the current QMCEM parameter value and let θ (t) denote the maximizer of Q P ( θ (t 1) ) in (8) based on r independent RQMC sequences each of length m t. Thus, θ (t) satisfies F P ( θ (t) θ (t 1) ) = 0, (9) where we define F P (θ θ ) = Q P (θ θ )/ θ. Let θ (t) denote the parameter update of the deterministic EM algorithm, that is, θ (t) satisfies F(θ (t) θ (t 1) ) = 0, (10) where, in similar fashion to above, we define F(θ θ ) = Q(θ θ )/ θ. Thus, a first order Taylor expansion of F P ( θ (t) θ (t 1) ) about θ (t) yields ( θ (t) θ (t) ) T SP (θ (t) θ (t 1) ) F P (θ (t) θ (t 1) ), (11) 7

8 where we define the matrix S P (θ θ ) = 2 QP (θ θ )/ θ θ T. Under RQMC sampling, QP is an unbiased estimate of Q. Assuming mild regularity conditions, it follows that for the expectation E[ F P (θ (t) θ (t 1) )] = F(θ (t) θ (t 1) ) = 0. (12) Therefore, the expected value of θ (t) is θ (t) and its variance-covariance matrix is given by Var( θ (t) ) = [ SP ] (θ (t) θ (t 1) 1 ) Var( FP (θ (t) θ (t 1) [ SP ] ) (θ (t) θ (t 1) 1 ). (13) Under regular Monte Carlo sampling, it follows that, for a large enough Monte Carlo sample size, θ(t) is approximately normal distributed with mean and variance specified above. Under RQMC sampling, however, the accuracy of the normal approximation may depend on the number r of independent RQMC sequences. In Section 4 we consider a range of values for r in order to investigate its effect on QMCEM. In our implementations we estimate Var( θ (t) ) by substituting θ (t) for θ (t) in (13) and estimate Var( F P (θ (t) θ (t 1) ) via 1 r 2 r j=1 ( ) ( θ Q (j) (θ θ (t 1) T ) θ Q (j) (θ θ )) (t 1). (14) = (t) Larger values of r should result in a more accurate estimate for Var( θ (t) ). However, we also pointed out that smaller values of r should result in a better accuracy of the Monte Carlo estimate in (8), since it takes better advantage of the low-discrepancy property of each individual sequence u (j) 1,..., u(j) m t. We investigate the impact of this trade-off on the overall efficiency of the method in Section 4. The QMCEM algorithm proceeds as follows. Following Booth & Hobert s recommendation, we measure the noise in the QMCEM update θ (t) by constructing a (1 α) 100% confidence ellipsoid about the deterministic EM update θ (t), using the normal approximation for θ (t). If this ellipsoid contains the previous parameter value θ (t 1), then we conclude that the system is too noisy and we increase the length m t of the RQMC sequence. Booth et al. (2001) argue that the sample sizes should be increased at an exponential rate. Thus, we increase the sample size to m t+1 := (1+κ)m t, where κ is a small number, typically κ = 0.2, 0.3, 0.4. Since stochastic algorithms, like MCEM, can satisfy deterministic stopping rules purely by chance, it is recommended to continue the method until the stopping rule is satisfied for several consecutive iterations (see also Booth & Hobert, 8

9 1999). Thus, we stop the algorithm when the relative change in two successive parameter updates is smaller than some small number δ, δ > 0, for 3 consecutive iterations. 3.2 Laplace Importance Sampling to generate RQMC sequences Recall that the pooled estimate in (8) is based on r independent RQMC sequences u (j) 1,..., u(j) m t, j = 1,..., r, simulated from f(u y; θ (t 1) ). In this section we demonstrate how to generate randomized Halton sequences using Laplace importance sampling. Laplace importance sampling has been proven useful to draw approximate samples from f(u y; θ) in many instances (see Booth & Hobert, 1999; Kuk, 1999). Laplace importance sampling attempts to find an importance sampling distribution whose mean and variance match the mode and curvature of f(u y; θ). More specifically, suppressing the dependence on y, let l(u; θ) = log f(y, u; θ) (15) denote the complete data log likelihood and let l (u; θ) and l (u; θ) denote its first and second derivatives in u, respectively. Suppose that ũ denotes the maximizer of l satisfying l (u; θ) = 0. Then the Laplace approximations to the mean and variance of f(u y; θ) are µ(θ) = ũ and Σ(θ) = {l (ũ; θ)} 1, respectively (e.g. De Bruijn, 1958). Booth & Hobert (1999) as well as Kuk (1999) propose to use a multivariate normal or multivariate t importance sampling distribution, shifted and scaled by µ(θ) and Σ(θ), respectively. Let f Lap (u y; θ) denote the resulting Laplace importance sampling distribution. Recall that by RQMC property 1), every element of a RQMC sequence has a uniform distribution over C d. Let x k be the kth element of a randomized Halton sequence. Using a suitable transformation (e.g. Robert & Casella, 1999), we can generate a d-vector of i.i.d. normal or t variates. Shifting and scaling this vector by µ(θ) and Σ(θ) results in a draw u k from f Lap (u y; θ). Thus, using r independent randomized Halton sequences of length m t, x (j) 1,..., x(j) m t, j = 1,..., r, we obtain r independent sequences u (j) 1,..., u(j) m t from f Lap (u y; θ). Booth & Hobert (1999) or Kuk (1999) successfully use Laplace importance sampling for the fitting of generalized linear mixed models. In the following we use the method to an application of generalized linear mixed models to data exhibiting spatial correlation. 9

10 4 Application: A Geostatistical Model of Online Purchases In this section we consider sales data from an online book publisher and retailer. The publisher sells online the titles it publishes in print form as well as, more recently, also in PDF form. The publisher has good reason to believe that a customer s preference for either print or PDF form varies significantly due to his or her geographical location. In fact, since the PDF form is directly downloaded from the publisher s web site, it requires a reliable and typically fast internet connection. However, the availability of reliable internet connections varies greatly across different regions. Moreover, directly downloaded PDF files provide content immediately without having to wait for shipment as in the case of a printed book. Thus, shipping times can also influence a customer s preference. The preference can also be affected by a customer s access to good quality printers or his/her technology readiness, all of which often exhibit strong local variability. Data exhibiting spatial correlation can be modelled using generalized linear mixed models (e.g Breslow & Clayton, 1993). Diggle et al. (1998) refer to these spatial applications of generalized linear mixed models as model based geostatistics. These spatial mixed models are challenging from a computational point of view since they often involve approximating rather high dimensional integrals. In the following we consider a set of data leading to an analytically intractable likelihoodintegral of dimension 16. Let {z i } d i=1, z i = (z i1, z i2 ), denote the spatial coordinates of the observed responses {y i } d i=1. For example, z i1 and z i2 could denote the longitude and latitude of the observation y i. While y i could represent a variety of response types, we focus here on the binomial case only. For instance, y i could indicate whether or not a person living at location z i has a certain disease or whether or not this person has a preference for a certain product. One of the modelling goals is to account for the possibility that two people living in close geographic proximity are more likely to share the same disease or the same preference. Let u = (u 1,..., u d ) be a vector of random effects. Assume that, conditional on u i, the responses y i arise from the model ( ) exp(β + u i ) y i u i Binomial n i,, (16) 1 + exp(β + u i ) where β is an unknown regression coefficient. Assume furthermore that u follows a multivariate normal distribution with mean zero and covariance structure such that the correlation between two 10

11 random effects decays with the geographical distance between the associated two observations. For example, assume that Cov(u i, u j ) = σ 2 exp{ α z i z j }, (17) where denotes the Euclidian norm. While different modelling alternatives exist (see, for example, Diggle et al., 1998), we will use the above model to investigate the efficiency of Quasi-Monte Carlo MCEM implementations for estimating the parameter vector θ = (β, σ 2, α). We analyze a set of online retail data for the Washington, DC, area. Washington is a very diverse area with respect to a variety of aspects like socio-economic factors or infrastructure. This diversity is often expressed in regionally/locally strongly varying customer preferences. The data set consists of 39 customers who accessed the publisher s web site and either purchased the title in print form or in PDF. In addition to a customer s purchasing choice, the publisher also recorded the customer s geographical location. Geographical location can easily be obtained (at least approximately) through the customer s ZIP code. ZIP code information can then be transformed into longitudinal and latitudinal coordinates. After aggregating customers from the same ZIP code with the same preference, we obtained d = 16 distinct geographical locations. Let n i denote the number of purchases from location i and let y i denote the number of PDF purchases thereof. Figure 2 displays the data. Figure 2 about here Quasi-Monte Carlo has been found to improve upon the efficiency of classical Monte Carlo methods in a variety of setting. For instance, Bhat (2001) reports efficiency gains via the Halton sequence in a logit model for integral dimensions ranging from 1 to 5. Lemieux & L Ecuyer (1998), on the other hand, consider integral dimensions as large as 120 and find efficiency improvements for the pricing of Asian options. In our example, the correlation structure of the random effects in equation (17) causes the likelihood function (and therefore also the E-step of the EM algorithm) to include an analytically intractable integral of dimension 16. Indeed, the (marginal) likelihood function for the model in (16) and (17) can be written as ( ) d exp{ 0.5u T Σ 1 u} L(θ; y) f(y i u i ; θ) Σ 1/2 du, (18) where u = (u 1,..., u 16 ) T i=1 contains the random effects corresponding to the 16 distinct locations and Σ is a matrix with elements σ ij = Cov(u i, u j ). 11

12 The evaluation of high dimensional integrals is computationally burdensome. We conducted a simulation study to investigate the efficiency of QMC approaches relative to that of classical Monte Carlo. Table 1 shows the results for three different QMCEM algorithms, using r = 5, r = 10 and r = 30 RQMC sequences, respectively. This compares to an implementation of MCEM using classical Monte Carlo techniques. We can see that the Monte Carlo standard errors of the parameter estimates of θ = (β, σ 2, α) are very similar across the estimation methods, indicating that all 4 methods estimate the parameters with (on average) comparable accuracy. However, the total simulation efforts required to obtain this accuracy differs greatly. Indeed, while classical Monte Carlo requires an average number of 800,200 simulated vectors (each of dimension 16!), it only takes 20,836 for QMC (using r = 5 RQMC sequences). This is a reduction in the total simulation effort by a factor of almost 40! It is also interesting to note that among the 3 different QMC approaches, choosing r = 30 RQMC sequences results in a (average) total simulation effort of 30,997 simulated vectors compared to only 20,836 for r = 5. Table 1 about here The reduction in the total simulation effort that is possible with the use of QMC methods is intriguing. The MCEM algorithm usually spends most of its simulation effort in the final iterations when the algorithm is in the vicinity of the MLE. This has already been observed by, for example, Booth & Hobert (1999) or McCulloch (1997). The reason for this is the convergence behavior of the underlying deterministic EM algorithm. EM usually takes large steps in the early iterations, but the size of the steps reduce drastically as EM approaches ˆθ. The step size in the tth iteration of EM can be thought of as the signal that is transmitted to MCEM. However, due to the error in the Monte Carlo approximation of the E-step in (7), MCEM receives only a noisy version of that signal. While the signal-to-noise ratio is large in the early iterations of MCEM, it declines continuously as MCEM approaches ˆθ. This makes larger Monte Carlo sample sizes necessary in order to increase the accuracy of the approximation in (7) and therefore to reduce the noise. Table 1 shows that QMC methods, due to their superior ability to estimate an intractable integral more accurately, manage to reduce that noise with smaller sample sizes. The result is a smaller total simulation effort required by QMC. Table 1 also shows that among the 3 different QMCEM algorithms, implementations that use fewer but longer low-discrepancy sequences result in a better total simulation effort than a large 12

13 number of short sequences. Indeed, the simulation effort for r = 30 RQMC sequences is about 50% higher than that for r = 5 or r = 10. We pointed out in Section 2 that for a given total simulation amount r m, smaller values of r paired with larger values of m should result in a more accurate integral-estimate. On the other hand, the trade-off for using small values of r is a less accurate variance estimate in (14). In order to implement MCEM using randomized Halton sequences, a balance has to be achieved between a more accurate integral-estimate (i.e. less noise) and a more accurate variance estimate. In our example, we found this balance for values of r between 5 and 10. We also experimented with values smaller than 5 and frequently encountered problems with the numerical stability of the estimate of the covariance matrix in (14). In the final paragraphs of this section we want to take a closer look at noise of the QMCEM algorithm and compare it to classical MCEM. Figure 3 visualizes the Monte Carlo error for three different Monte Carlo estimation methods: classical Monte Carlo using random sampling (column 1), randomized Quasi-Monte Carlo with r = 5 RQMC sequences (column 2) and pure Quasi-Monte Carlo without randomization (column 3). Figure 3 about here We can see that for classical Monte Carlo, the average parameter update (thick solid line) is very volatile and has wide confidence bounds (dotted lines). This suggests that the Monte Carlo error is huge. This is in strong contrast to QMC. Indeed, for pure QMC sampling the parameter updates are significantly less volatile with much tighter confidence bounds. Notice that we allocated the same simulation effort for both simulation methods! It takes classical MCEM much larger sample sizes to reduce the noise to the same level as under QMC sampling. We have argued at the beginning of this paper that in order to implement MCEM in an automated way, the ability to estimate the error in the Monte Carlo approximation is essential. Randomized QMC methods provide this ability. While randomized Halton sequences have the lowdiscrepancy property (and thus estimate the integral with a higher accuracy than classical Monte Carlo), randomization may not come for free. Indeed, the second column of Figure 3 shows that, while the error reduction is still substantial compared to a classical Monte Carlo approach, the system is noisier than under pure QMC sampling. 13

14 5 Conclusion In this paper we have demonstrated how recent advances in randomized Quasi-Monte Carlo can be used to implement the MCEM algorithm in an automated, data-driven way. The empirical investigations provide encouraging evidence that this Quasi-Monte Carlo EM algorithm can lead to a significant efficiency gains over implementations using regular Monte Carlo methods. We focused our investigations in this work on the randomized Halton sequence only. Other randomized Quasi-Monte Carlo methods exist. See, for example, Owen (1998a) or L Ecuyer & Lemieux (2002). It could be a rewarding topic for future research to investigate the benefits of different Quasi-Monte Carlo methods for the implementation of Monte Carlo EM (and also other stochastic estimation methods that are frequently encountered in the statistics literature). Acknowledgements All the simulations made in this work are based on the programming language Ox of Doornik (2001). 14

15 References Bhat, C. (2001). Quasi-random maximum simulated likelihood estimation for the mixed multinomial logit model. Transportation Research 35, Booth, J. G. & Hobert, J. P. (1999). Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society B 61, Booth, J. G., Hobert, J. P. & Jank, W. (2001). A survey of Monte Carlo algorithms for maximizing the likelihood of a two-stage hierarchical model. Statistical Modelling 1, Bouleau, N. & Lépingle, D. (1994). Numerical Methods for Stochastic Processes. New York: Wiley. Boyles, R. A. (1983). On the convergence of the EM algorithm. Journal of the Royal Statistical Society B 45, Breslow, N. E. & Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association 88, Caflisch, R., Morokoff, W. & Owen, A. (1997). Valuation of mortgage-backed securities using brownian bridges to reduce effective dimension. Journal of Computational Finance 1, Chan, K. S. & Ledolter, J. (1995). Monte Carlo EM estimation for time series models involving counts. Journal of the American Statistical Association 90, De Bruijn, N. G. (1958). Asymptotic Methods in Analysis. Amsterdam: North-Holland. Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39, Diggle, P. J., Tawn, J. A. & Moyeed, R. A. (1998). Model-based geostatistics. Journal of the Royal Statistical Society A 47, Doornik, J. A. (2001). Ox: Object Oriented Matrix Programming. London: Timberlake. 15

16 Fang, K.-T. & Wang, Y. (1994). Number Theoretic Methods in Statistics. New York: Chapman & Hall. Faure, H. (1982). Discrépance de suites associées à un système de numération (en dimension s). Acta Arithmetica 41, Halton, J. H. (1960). On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals. Numerische Mathematik 2, Hobert, J. P. (2000). Hierarchical models: A current computational perspective. Journal of the American Statistical Association 95, Kuk, A. Y. C. (1999). Laplace importance sampling for generalized linear mixed models. Journal of Statistical Computation and Simulation 63, Lange, K. (1995). A gradient algorithm locally equivalent to the EM algorithm. Journal of the Royal Statistical Society B 57, L Ecuyer, P. & Lemieux, C. (2002). Recent advances in randomized Quasi-Monte Carlo Methods. In Modeling Uncertainty: An Examination of Stochastic Theory, Methods, and Applications, M. Dror, P. L Ecuyer & F. Szidarovszki, eds. Kluwer Academic Publishers. Lemieux, C. & L Ecuyer, P. (1998). Efficiency improvement by lattice rules for pricing asian options. In Proceedings of the 1998 Winter Simulation Conference. IEEE Press. Levine, R. & Fan, J. (2003). An automated (Markov Chain) Monte Carlo EM algorithm. Tech. rep., San Diego State University. Levine, R. A. & Casella, G. (2001). Implementations of the Monte Carlo EM algorithm. Journal of Computational and Graphical Statistics 10, McCulloch, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models. Journal of the American Statistical Association 92, Meng, X.-L. & Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80,

17 Morokoff, W. J. & Caflisch, R. E. (1995). Quasi-Monte Carlo integration. Journal of Computational Physics 122, Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Methods. Philadelphia: SIAM. Owen, A. (1998a). Scrambling Sobol and Niederreiter-Xing points. Journal of Complexity 14, Owen, A. B. (1998b). Monte Carlo extension of Quasi-Monte Carlo. In 1998 Winter Simulation Conference Proceedings. New York: Springer, pp Pagès, G. (1992). Van der Corput sequences, Kakutani transforms and one-dimensional numerical integration. Journal of Computational and Applied Mathematics 44, Robert, C. P. & Casella, G. (1999). Monte Carlo Statistical Methods. New York: Springer. Sobol, I. M. (1967). Distribution of points in a cube and approximate evaluation of integrals. U.S.S.R. Computational Mathematics and Mathematical Physics 7, Wang, X. & Hickernell, F. J. (2000). Randomized Halton sequences. Mathematical and Computer Modelling 32, Wei, G. C. G. & Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man s data augmentation algorithms. Journal of the American Statistical Association 85, Wu, C. F. J. (1983). On the convergence properties of the EM algorithm. The Annals of Statistics 11,

18 Regular Monte Carlo Quasi-Monte Carlo Figure 1: 2500 points in the unit square: The upper plot shows the result of regular Monte Carlo sampling; that is, 2500 points selected randomly. Random points tend to form clusters, oversampling the unit square in some places; this leads to gaps in other places, where the sample space is not explored at all. The lower plot shows the result of Quasi-Monte Carlo sampling: 2500 points of a two dimensional Halton sequence. 18

19 Geographical Distribution of Data Latitude Longitude Proportion of PDF Purchases per Location Proportion Latitude Longitude Figure 2: Geographical distribution of PDF purchases for Washington, DC: The upper plot shows the geographical borders of Washington, DC, as well as the geographical location of the 39 purchases of PDF or Print. The lower plot displays the geographical scatter of the relative proportion of PDF purchases. 19

20 Classical Monte Carlo Randomized Quasi Monte Carlo Pure Quasi Monte Carlo Beta Beta Beta Sigma Sigma Sigma Alpha Alpha Alpha Figure 3: Monte Carlo Error and Quasi-Monte Carlo Error: Starting MCEM near the MLE, we performed 100 iterations using a fixed Monte Carlo sample size of rm t 1000, t = 1,..., 100. We repeated this experiment 50 times for a) MCEM using classical Monte Carlo sampling (column 1); b) randomized Quasi-Monte Carlo with r = 5 (column 2); c) pure Quasi-Monte Carlo without randomization, i.e. r = 1 (column 3). For each parameter value we plotted the average of the 50 iteration histories (thick, solid lines) as well as pointwise 95% confidence bounds (dotted lines). 20

21 Table 1: Spatial model: The table investigates the efficiency of Quasi-Monte Carlo implementations of MCEM for fitting geostatistical models. We investigate three different Quasi-Monte Carlo (QMC) algorithms using r = 5, 10 and 30 independent RQMC sequences, respectively. These RQMC sequences are obtained via randomized Halton sequences using Laplace importance sampling based on a t distribution with 10 degrees of freedom. We benchmark these three QMC algorithms against an implementation of MCEM based on regular Monte Carlo (MC) sampling using the same Laplace importance sampler. We start each algorithm from (β (0), σ 2(0), α (0) ) = (0, 1, 1) and increase the length of the RQMC sequences according to Section 3.1 using α = 0.25 and κ = 0.2. The algorithm is terminated if the relative difference in two successive parameter updates falls below δ = 0.01 for 3 consecutive iterations. For each of the four MCEM implementations we performed this experiment 50 times recording the final parameter values, β i, σi 2 and α i and the total number of simulated vectors, N = T i j=1 r m j, where T i denotes the final iteration number (i = 1,..., 50). The table displays the Monte Carlo average (AVG) and the Monte Carlo standard error (SE) for these values. For instance, for the regression parameter β it displays the average β over the 50 replications and the Monte Carlo standard error s β / 50, where s β denotes the sample standard deviation over the 50 replicates. β σ 2 α N MC AVG ,200 SE ,347 QMC AVG ,836 (r=5) SE QMC AVG ,768 (r=10) SE QMC AVG ,997 (r=30) SE ,234 21

arxiv: v1 [math.st] 21 Jun 2012

arxiv: v1 [math.st] 21 Jun 2012 IMS Collections c Institute of Mathematical Statistics, On Convergence Properties of the Monte Carlo EM Algorithm Ronald C. Neath Hunter College, City University of New York arxiv:1206.4768v1 math.st]

More information

TRB 2005: For Presentation and Publication Paper # Final Submission: April 1, 2005 Word Count: 8,191

TRB 2005: For Presentation and Publication Paper # Final Submission: April 1, 2005 Word Count: 8,191 Simulation Estimation of Mixed Discrete Choice Models Using Randomized Quasi-Monte Carlo Sequences: A Comparison of Alternative Sequences, Scrambling Methods, and Uniform-to-Normal Variate Transformation

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Quasi-Random Simulation of Discrete Choice Models

Quasi-Random Simulation of Discrete Choice Models Quasi-Random Simulation of Discrete Choice Models by Zsolt Sándor and Kenneth Train Erasmus University Rotterdam and University of California, Berkeley June 12, 2002 Abstract We describe the properties

More information

Estimating the parameters of hidden binomial trials by the EM algorithm

Estimating the parameters of hidden binomial trials by the EM algorithm Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Randomized Quasi-Monte Carlo Simulation of Markov Chains with an Ordered State Space

Randomized Quasi-Monte Carlo Simulation of Markov Chains with an Ordered State Space Randomized Quasi-Monte Carlo Simulation of Markov Chains with an Ordered State Space Pierre L Ecuyer 1, Christian Lécot 2, and Bruno Tuffin 3 1 Département d informatique et de recherche opérationnelle,

More information

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

New Global Optimization Algorithms for Model-Based Clustering

New Global Optimization Algorithms for Model-Based Clustering New Global Optimization Algorithms for Model-Based Clustering Jeffrey W. Heath Department of Mathematics University of Maryland, College Park, MD 7, jheath@math.umd.edu Michael C. Fu Robert H. Smith School

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Obtaining Critical Values for Test of Markov Regime Switching

Obtaining Critical Values for Test of Markov Regime Switching University of California, Santa Barbara From the SelectedWorks of Douglas G. Steigerwald November 1, 01 Obtaining Critical Values for Test of Markov Regime Switching Douglas G Steigerwald, University of

More information

Improved Estimation and Uncertainty Quantification. Using Monte Carlo-based Optimization Algorithms

Improved Estimation and Uncertainty Quantification. Using Monte Carlo-based Optimization Algorithms Improved Estimation and Uncertainty Quantification Using Monte Carlo-based Optimization Algorithms Cong Xu, Paul Baines and Jane-Ling Wang June 16, 2014 Abstract In this paper we present a novel method

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Low Discrepancy Sequences in High Dimensions: How Well Are Their Projections Distributed?

Low Discrepancy Sequences in High Dimensions: How Well Are Their Projections Distributed? Low Discrepancy Sequences in High Dimensions: How Well Are Their Projections Distributed? Xiaoqun Wang 1,2 and Ian H. Sloan 2 1 Department of Mathematical Sciences, Tsinghua University, Beijing 100084,

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Uniform Random Number Generators

Uniform Random Number Generators JHU 553.633/433: Monte Carlo Methods J. C. Spall 25 September 2017 CHAPTER 2 RANDOM NUMBER GENERATION Motivation and criteria for generators Linear generators (e.g., linear congruential generators) Multiple

More information

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

PACKAGE LMest FOR LATENT MARKOV ANALYSIS PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Gaussian Process Regression Model in Spatial Logistic Regression

Gaussian Process Regression Model in Spatial Logistic Regression Journal of Physics: Conference Series PAPER OPEN ACCESS Gaussian Process Regression Model in Spatial Logistic Regression To cite this article: A Sofro and A Oktaviarina 018 J. Phys.: Conf. Ser. 947 01005

More information

Variance and discrepancy with alternative scramblings

Variance and discrepancy with alternative scramblings Variance and discrepancy with alternative scramblings ART B. OWEN Stanford University This paper analyzes some schemes for reducing the computational burden of digital scrambling. Some such schemes have

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Inferring biological dynamics Iterated filtering (IF)

Inferring biological dynamics Iterated filtering (IF) Inferring biological dynamics 101 3. Iterated filtering (IF) IF originated in 2006 [6]. For plug-and-play likelihood-based inference on POMP models, there are not many alternatives. Directly estimating

More information

R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models

R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models Christina Knudson, Ph.D. University of St. Thomas user!2017 Reviewing the Linear Model The usual linear model assumptions:

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm

More information

Tutorial on quasi-monte Carlo methods

Tutorial on quasi-monte Carlo methods Tutorial on quasi-monte Carlo methods Josef Dick School of Mathematics and Statistics, UNSW, Sydney, Australia josef.dick@unsw.edu.au Comparison: MCMC, MC, QMC Roughly speaking: Markov chain Monte Carlo

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY

COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY Annales Univ Sci Budapest, Sect Comp 45 2016) 45 55 COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY Ágnes M Kovács Budapest, Hungary) Howard M Taylor Newark, DE, USA) Communicated

More information

The Expectation Maximization Algorithm

The Expectation Maximization Algorithm The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-- February Abstract This note represents my attempt at explaining

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes

Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes Philip Jonathan Shell Technology Centre Thornton, Chester philip.jonathan@shell.com Paul Northrop University College

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

Global Convergence of Model Reference Adaptive Search for Gaussian Mixtures

Global Convergence of Model Reference Adaptive Search for Gaussian Mixtures Global Convergence of Model Reference Adaptive Search for Gaussian Mixtures Jeffrey W. Heath Department of Mathematics University of Maryland, College Park, MD 20742, jheath@math.umd.edu Michael C. Fu

More information

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Label Switching and Its Simple Solutions for Frequentist Mixture Models Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a

More information

LOW DISCREPANCY SEQUENCES FOR THE ESTIMATION OF MIXED LOGIT MODELS

LOW DISCREPANCY SEQUENCES FOR THE ESTIMATION OF MIXED LOGIT MODELS 1 LOW DISCREPANCY SEQUENCES FOR THE ESTIMATION OF MIXED LOGIT MODELS Rodrigo A. Garrido and Mariela Silva Pontificia Universidad Católica de Chile Department of Transport Engineering Tel: 56-2-686 4270,

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Method of Conditional Moments Based on Incomplete Data

Method of Conditional Moments Based on Incomplete Data , ISSN 0974-570X (Online, ISSN 0974-5718 (Print, Vol. 20; Issue No. 3; Year 2013, Copyright 2013 by CESER Publications Method of Conditional Moments Based on Incomplete Data Yan Lu 1 and Naisheng Wang

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

ESUP Accept/Reject Sampling

ESUP Accept/Reject Sampling ESUP Accept/Reject Sampling Brian Caffo Department of Biostatistics Johns Hopkins Bloomberg School of Public Health January 20, 2003 Page 1 of 32 Monte Carlo in general Page 2 of 32 Use of simulated random

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

STRONG TRACTABILITY OF MULTIVARIATE INTEGRATION USING QUASI MONTE CARLO ALGORITHMS

STRONG TRACTABILITY OF MULTIVARIATE INTEGRATION USING QUASI MONTE CARLO ALGORITHMS MATHEMATICS OF COMPUTATION Volume 72, Number 242, Pages 823 838 S 0025-5718(02)01440-0 Article electronically published on June 25, 2002 STRONG TRACTABILITY OF MULTIVARIATE INTEGRATION USING QUASI MONTE

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Randomized Quasi-Monte Carlo for MCMC

Randomized Quasi-Monte Carlo for MCMC Randomized Quasi-Monte Carlo for MCMC Radu Craiu 1 Christiane Lemieux 2 1 Department of Statistics, Toronto 2 Department of Statistics, Waterloo Third Workshop on Monte Carlo Methods Harvard, May 2007

More information

Monte Carlo Methods. Handbook of. University ofqueensland. Thomas Taimre. Zdravko I. Botev. Dirk P. Kroese. Universite de Montreal

Monte Carlo Methods. Handbook of. University ofqueensland. Thomas Taimre. Zdravko I. Botev. Dirk P. Kroese. Universite de Montreal Handbook of Monte Carlo Methods Dirk P. Kroese University ofqueensland Thomas Taimre University ofqueensland Zdravko I. Botev Universite de Montreal A JOHN WILEY & SONS, INC., PUBLICATION Preface Acknowledgments

More information

A Review of Pseudo-Marginal Markov Chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Reduction of Random Variables in Structural Reliability Analysis

Reduction of Random Variables in Structural Reliability Analysis Reduction of Random Variables in Structural Reliability Analysis S. Adhikari and R. S. Langley Department of Engineering University of Cambridge Trumpington Street Cambridge CB2 1PZ (U.K.) February 21,

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Trip Generation Characteristics of Super Convenience Market Gasoline Pump Stores

Trip Generation Characteristics of Super Convenience Market Gasoline Pump Stores Trip Generation Characteristics of Super Convenience Market Gasoline Pump Stores This article presents the findings of a study that investigated trip generation characteristics of a particular chain of

More information

Fast Likelihood-Free Inference via Bayesian Optimization

Fast Likelihood-Free Inference via Bayesian Optimization Fast Likelihood-Free Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology

More information

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like

More information

Estimating Gaussian Mixture Densities with EM A Tutorial

Estimating Gaussian Mixture Densities with EM A Tutorial Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

K-ANTITHETIC VARIATES IN MONTE CARLO SIMULATION ISSN k-antithetic Variates in Monte Carlo Simulation Abdelaziz Nasroallah, pp.

K-ANTITHETIC VARIATES IN MONTE CARLO SIMULATION ISSN k-antithetic Variates in Monte Carlo Simulation Abdelaziz Nasroallah, pp. K-ANTITHETIC VARIATES IN MONTE CARLO SIMULATION ABDELAZIZ NASROALLAH Abstract. Standard Monte Carlo simulation needs prohibitive time to achieve reasonable estimations. for untractable integrals (i.e.

More information

Simulated Annealing for Constrained Global Optimization

Simulated Annealing for Constrained Global Optimization Monte Carlo Methods for Computation and Optimization Final Presentation Simulated Annealing for Constrained Global Optimization H. Edwin Romeijn & Robert L.Smith (1994) Presented by Ariel Schwartz Objective

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Lossless Online Bayesian Bagging

Lossless Online Bayesian Bagging Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely

More information

A Review of Basic Monte Carlo Methods

A Review of Basic Monte Carlo Methods A Review of Basic Monte Carlo Methods Julian Haft May 9, 2014 Introduction One of the most powerful techniques in statistical analysis developed in this past century is undoubtedly that of Monte Carlo

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Communications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study

Communications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Journal: Manuscript ID: LSSP-00-0.R Manuscript Type: Original Paper Date Submitted by the Author: -May-0 Complete List

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

Accounting for Missing Values in Score- Driven Time-Varying Parameter Models

Accounting for Missing Values in Score- Driven Time-Varying Parameter Models TI 2016-067/IV Tinbergen Institute Discussion Paper Accounting for Missing Values in Score- Driven Time-Varying Parameter Models André Lucas Anne Opschoor Julia Schaumburg Faculty of Economics and Business

More information

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17 MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making

More information

Kernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt.

Kernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt. SINGAPORE SHANGHAI Vol TAIPEI - Interdisciplinary Mathematical Sciences 19 Kernel-based Approximation Methods using MATLAB Gregory Fasshauer Illinois Institute of Technology, USA Michael McCourt University

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of

More information

The square root rule for adaptive importance sampling

The square root rule for adaptive importance sampling The square root rule for adaptive importance sampling Art B. Owen Stanford University Yi Zhou January 2019 Abstract In adaptive importance sampling, and other contexts, we have unbiased and uncorrelated

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

Supplementary Note on Bayesian analysis

Supplementary Note on Bayesian analysis Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Bayesian Econometrics

Bayesian Econometrics Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence

More information