Quasi-Monte Carlo Sampling to improve the Efficiency of Monte Carlo EM
|
|
- Wilfred Hancock
- 5 years ago
- Views:
Transcription
1 Quasi-Monte Carlo Sampling to improve the Efficiency of Monte Carlo EM Wolfgang Jank Department of Decision and Information Technologies University of Maryland College Park, MD November 17, 2003 Abstract In this paper we investigate an efficient implementation of the Monte Carlo EM algorithm based on Quasi-Monte Carlo sampling. The Monte Carlo EM algorithm is a stochastic version of the deterministic EM (Expectation-Maximization) algorithm in which an intractable E-step is replaced by a Monte Carlo approximation. Quasi-Monte Carlo methods produce deterministic sequences of points that can significantly improve the accuracy of Monte Carlo approximations over purely random sampling. One drawback to deterministic Quasi-Monte Carlo methods is that it is generally difficult to determine the magnitude of the approximation error. However, in order to implement the Monte Carlo EM algorithm in an automated way, the ability to measure this error is fundamental. Recent developments of randomized Quasi-Monte Carlo methods can overcome this drawback. We investigate the implementation of an automated, datadriven Monte Carlo EM algorithm based on randomized Quasi-Monte Carlo methods. We apply this algorithm to a geostatistical model of online purchases and find that it can significantly decrease the total simulation effort, thus showing great potential for improving upon the efficiency of the classical Monte Carlo EM algorithm. Key words and phrases: Monte Carlo error; low-discrepancy sequence; Halton sequence; EM algorithm; geostatistical model. 1
2 1 Introduction The Expectation-Maximization (EM) algorithm (Dempster et al., 1977) is a popular tool in statistics and many other fields. One limitation to the use of EM is, however, that quite often the E-step of the algorithm involves an analytically intractable, sometimes high dimensional integral. Hobert (2000), for example, considers a model for which the E-step involves intractable integrals of dimension twenty. The Monte Carlo EM (MCEM) algorithm, proposed by Wei & Tanner (1990), estimates this intractable integral with an empirical average based on simulated data. Typically, the simulated data is obtained by producing random draws from the distribution commanded by EM. By the law of large numbers, this integral-estimate can be made arbitrarily accurate by increasing the size of the simulated data. The MCEM algorithm typically requires a very high accuracy, especially at the later iterations. Booth & Hobert (1999), for example, report sample sizes of over 66,000 at convergence. This suggests that the overall efficiency of MCEM could be improved by using simulation methods that achieve a high accuracy in the integral-estimate with smaller sample sizes. Recent research has provided evidence that entirely random draws do not necessarily result in the most efficient use of the simulated data. In particular, one criticism of random draws is that they often do not explore the sample space well (Morokoff & Caflisch, 1995; Caflisch et al., 1997). For instance, points drawn at random tend to form clusters which leads to gaps where the sample space is not explored at all (see Figure 1 for illustration). This criticism has lead to the development of a variety of deterministic methods that provide for a better spread of the sample points. These deterministic methods are often classified as Quasi-Monte Carlo (QMC) methods. Theoretical as well as empirical research has shown that QMC methods can significantly increase the accuracy of the integral-estimate over random draws. Figure 1 about here In this paper we investigate an implementation of the MCEM algorithm based on QMC methods. Wei & Tanner (1990) point out that for an efficient implementation, the size of the simulated data should be chosen small at the initial stage but increased successively as the algorithm moves along. Early versions of the method require a manual, user-determined increase of the sample size, for instance, by allocating the amount of data to be simulated in each iteration already before the start 2
3 of the algorithm (e.g. McCulloch, 1997). Implementations of MCEM that determine the necessary sample size in an automated, data-driven fashion have been developed only recently (see Booth & Hobert, 1999; Levine & Casella, 2001; Levine & Fan, 2003). Automated implementations of MCEM base the decision to increase the sample size on the magnitude of the error in the integralapproximation. In their seminal work, Booth & Hobert (1999) use statistical methods to estimate this error when the simulated data is generated at random. However, since QMC methods are deterministic in nature, statistical methods do not apply. Moreover, determining the error of the QMC integral-estimate analytically can be extremely hard (Caflisch et al., 1997). Recently, the development of randomized QMC methods has overcome this early drawback. Randomized Quasi-Monte Carlo (RQMC) methods combine the benefits of deterministic sampling methods, which achieve a more uniform exploration of the sample space, with the statistical advantages of random draws. A survey of recent advances in RQMC methods can be found in L Ecuyer & Lemieux (2002). In this work we implement an automated MCEM algorithm based on RQMC methods. Specifically, we demonstrate how to obtain a QMC sample from the distribution commanded by EM and we use the ideas of RQMC sampling to measure the error of the integral-estimate in every iteration of the algorithm. We implement this Quasi-Monte Carlo EM (QMCEM) algorithm within the framework of the automated MCEM formulation proposed by Booth & Hobert (1999). The remainder of this paper is organized as follows. In Section 2 we briefly motivate the ideas surrounding QMC and RQMC. In Section 3 we explain how RQMC methods can be used to implement QMCEM in an automated, data-driven fashion. We apply this algorithm to a geostatistical model of online purchases in Section 4 and conclude with final remarks in Section 5. 2 Quasi-Monte Carlo Sampling Quasi-Monte Carlo methods can be regarded as a deterministic counterpart to classical Monte Carlo. Suppose we want to evaluate an (analytically intractable) integral I = f(x)dx (1) C d 3
4 over the d-dimensional unit cube, C d := [0, 1] d. Classical Monte Carlo integration randomly selects points x k Uniform(C d ), k = 1,..., m, and approximates (1) by the empirical average Ĩ = 1 m f(x k ). (2) m k=1 Quasi-Monte Carlo methods, on the other hand, select the points deterministically. Specifically, QMC methods produce a deterministic sequence of points that provides the best-possible spread in C d. These deterministic sequences are often referred to as low-discrepancy sequences (see, for example, Niederreiter, 1992; Fang & Wang, 1994). A variety of different low-discrepancy sequences exist. Examples include the Halton sequence (Halton, 1960), the Sobol sequence (Sobol, 1967), the Faure sequence (Faure, 1982), and the Niederreiter sequence (Niederreiter, 1992), but this list is not exhaustive. In this work we focus our attention on the Halton sequence since it is conceptually very appealing. 2.1 Halton Sequences Let b be a prime number. Then any integer k, k 0, can be written in base-b representation as k = d j b j + d j 1 b j d 1 b + d 0, where d i {0, 1,..., b 1} for i = 0, 1,..., j. Define the base-b radical inverse function, φ b (k), as φ b (k) = d 0 b 1 + d 1 b d j b j+1. Notice that for every integer k 0, φ b (k) [0, 1]. The kth element of the Halton sequence is obtained via the radical inverse function evaluated at k. Specifically, if b 1,..., b d are d different prime numbers, then a d-dimensional Halton sequence of length m is given by {x 1,..., x m }, where the kth element of the sequence is x k = [φ b1 (k 1),..., φ bd (k 1)] T, k = 1,..., m. (3) (See Halton (1960) or Wang & Hickernell (2000) for more details.) Notice that the Halton sequence does not need to be started at the origin. Indeed, for any d- vector of non-negative integers, n = (n 1,..., n d ) T, say, the Halton sequence with the first elements skipped, x k = [φ b1 (n 1 + k 1),..., φ bd (n d + k 1)] T, k = 1,..., m, (4) 4
5 remains a low-discrepancy sequence (see Pagès, 1992; Bouleau & Lépingle, 1994). We will refer to the sequence defined by (4) as a Halton sequence with starting point n. Figure 1 shows the first 2500 elements of a two-dimensional Haltion sequence with n = (0, 0) T. 2.2 Randomized Quasi-Monte Carlo Owen (1998b) points out that the main (practical) disadvantage of QMC is that determining the accuracy of the integral-estimate in (2) is typically very complicated, if not impossible. Moreover, since QMC methods are based on deterministic sequences, statistical procedures for error estimation do not apply. This drawback has lead to the development of randomized Quasi-Monte Carlo (RQMC) methods. L Ecuyer & Lemieux (2002) suggest that any RQMC sequence should have the following two properties: 1) every element of the sequence has a uniform distribution over C d ; 2) the lowdiscrepancy property of the sequence is preserved under the randomization. The first property guarantees that the approximation Ĩ in (2) is an unbiased estimate of the integral in (1). Moreover, one can estimate its variance by generating r independent copies of Ĩ (which is typically done by generating r independent sequences x (j) 1,..., x(j) m, j = 1,..., r). Given a desired total simulation amount N = rm, smaller values of r (paired with a larger value of m) should result in a better accuracy of the integral-estimate, since it takes better advantage of the low-discrepancy property of each sequence. At the extreme, taking r = N and m = 1 simply reproduces classical Monte Carlo estimation. 2.3 Randomized Halton Sequences Recall that, regardless of the starting point, the Halton sequence remains a low-discrepancy sequence. Wang & Hickernell (2000) use this fact to show that if the Halton sequence is started at a random point, x 1 Uniform(C d ), then it satisfies the RQMC properties 1) and 2) from Subsection 2.2. In the following sections, we will use RQMC sampling based on the randomized Halton sequence. 5
6 3 Quasi-Monte Carlo EM The Expectation-Maximization (EM) algorithm (Dempster et al., 1977) is an iterative procedure useful to approximate the maximum likelihood estimator (MLE) in incomplete data problems. Let y be a vector of observed data, let u be a vector of unobserved data or random effects and let θ denote a vector of parameters. Furthermore, let f(y, u; θ) denote the joint density of the complete data, (y, u). Let L(θ; y) = f(y, u; θ)du denote the (marginal) likelihood function for this model. The MLE, ˆθ, maximizes L( ; y). In each iteration, the EM algorithm performs an expectation and a maximization step. Let θ (t 1) denote the current parameter value. Then, in the tth iteration of the algorithm, the E- step computes the conditional expectation of the complete data log-likelihood, conditional on the observed data and the current parameter value, [ Q(θ θ (t 1) ) = E log f(y, u; θ) y; θ (t 1)]. (5) The tth EM update, θ (t), maximizes (5). That is θ (t) satisfies Q(θ (t) θ (t 1) ) Q(θ θ (t 1) ) (6) for all θ in the parameter space. This is also known as the M-step. The M-step is often implemented using standard numerical methods like Newton-Raphson (see Lange, 1995). Solutions to overcome a difficult M-step have been proposed in, for example, Meng & Rubin (1993). Given an initial value θ (0), the EM algorithm produces a sequence {θ (0), θ (1), θ (2),... } that, under regularity conditions (see Boyles, 1983; Wu, 1983), converges to ˆθ. In this work we focus on the situation when the E-step does not have a closed form solution. Wei & Tanner (1990) proposed to approximate an analytically intractable expectation in (5) by the empirical average Q(θ θ (t 1) ) Q(θ θ (t 1) ; u 1,..., u mt ) = 1 m t log f(y, u k ; θ), (7) m t where u 1,..., u mt are simulated from the conditional distribution f(u y; θ (t 1) ). Then, by the law of large numbers, Q(θ θ (t 1) ) will be a reasonable approximation to Q(θ θ (t 1) ) if m t is large enough. k=1 6
7 We consider a modification of (7) suitable for RQMC sampling. Let u (j) 1,..., u(j) m t, j = 1,..., r, be r independent RQMC sequences of length m t, each simulated from f(u y; θ (t 1) ). (The details of how to simulate a RQMC sequence from f(u y; θ (t 1) ) are deferred until Subsection 3.2.) Then, an unbiased estimate of (5) is given by the pooled estimate Q P (θ θ (t 1) ) = 1 r Q (j) (θ θ (t 1) ), (8) r j=1 where Q (j) (θ θ (t 1) ) = Q(θ θ (t 1) ; u (j) 1,..., u(j) m t ) in (7). The tth Quasi-Monte Carlo EM (QMCEM) update, θ (t), maximizes Q P ( θ (t 1) ). 3.1 Increasing the length of the RQMC sequences We have pointed out earlier that the Monte Carlo sample sizes m t should be increased successively as the algorithm moves along. In fact, Booth et al. (2001) argue that MCEM will never converge if m t is held fixed across iterations because of a persevering Monte Carlo error (see also Chan & Ledolter, 1995). While earlier versions of the method choose the Monte Carlo sample sizes in a deterministic fashion before the start of the algorithm (e.g. McCulloch, 1997), the same deterministic allocation of Monte Carlo resources that works well in one problem may result in a very inefficient (or inaccurate) algorithm in another problem. Thus, data-dependent (and user-independent) sample size rules are necessary in order to implement MCEM in an automated way. Booth & Hobert (1999) base the decision of a sample size increase on the noise in the parameter updates (see also Levine & Casella, 2001; Levine & Fan, 2003). Let θ (t 1) denote the current QMCEM parameter value and let θ (t) denote the maximizer of Q P ( θ (t 1) ) in (8) based on r independent RQMC sequences each of length m t. Thus, θ (t) satisfies F P ( θ (t) θ (t 1) ) = 0, (9) where we define F P (θ θ ) = Q P (θ θ )/ θ. Let θ (t) denote the parameter update of the deterministic EM algorithm, that is, θ (t) satisfies F(θ (t) θ (t 1) ) = 0, (10) where, in similar fashion to above, we define F(θ θ ) = Q(θ θ )/ θ. Thus, a first order Taylor expansion of F P ( θ (t) θ (t 1) ) about θ (t) yields ( θ (t) θ (t) ) T SP (θ (t) θ (t 1) ) F P (θ (t) θ (t 1) ), (11) 7
8 where we define the matrix S P (θ θ ) = 2 QP (θ θ )/ θ θ T. Under RQMC sampling, QP is an unbiased estimate of Q. Assuming mild regularity conditions, it follows that for the expectation E[ F P (θ (t) θ (t 1) )] = F(θ (t) θ (t 1) ) = 0. (12) Therefore, the expected value of θ (t) is θ (t) and its variance-covariance matrix is given by Var( θ (t) ) = [ SP ] (θ (t) θ (t 1) 1 ) Var( FP (θ (t) θ (t 1) [ SP ] ) (θ (t) θ (t 1) 1 ). (13) Under regular Monte Carlo sampling, it follows that, for a large enough Monte Carlo sample size, θ(t) is approximately normal distributed with mean and variance specified above. Under RQMC sampling, however, the accuracy of the normal approximation may depend on the number r of independent RQMC sequences. In Section 4 we consider a range of values for r in order to investigate its effect on QMCEM. In our implementations we estimate Var( θ (t) ) by substituting θ (t) for θ (t) in (13) and estimate Var( F P (θ (t) θ (t 1) ) via 1 r 2 r j=1 ( ) ( θ Q (j) (θ θ (t 1) T ) θ Q (j) (θ θ )) (t 1). (14) = (t) Larger values of r should result in a more accurate estimate for Var( θ (t) ). However, we also pointed out that smaller values of r should result in a better accuracy of the Monte Carlo estimate in (8), since it takes better advantage of the low-discrepancy property of each individual sequence u (j) 1,..., u(j) m t. We investigate the impact of this trade-off on the overall efficiency of the method in Section 4. The QMCEM algorithm proceeds as follows. Following Booth & Hobert s recommendation, we measure the noise in the QMCEM update θ (t) by constructing a (1 α) 100% confidence ellipsoid about the deterministic EM update θ (t), using the normal approximation for θ (t). If this ellipsoid contains the previous parameter value θ (t 1), then we conclude that the system is too noisy and we increase the length m t of the RQMC sequence. Booth et al. (2001) argue that the sample sizes should be increased at an exponential rate. Thus, we increase the sample size to m t+1 := (1+κ)m t, where κ is a small number, typically κ = 0.2, 0.3, 0.4. Since stochastic algorithms, like MCEM, can satisfy deterministic stopping rules purely by chance, it is recommended to continue the method until the stopping rule is satisfied for several consecutive iterations (see also Booth & Hobert, 8
9 1999). Thus, we stop the algorithm when the relative change in two successive parameter updates is smaller than some small number δ, δ > 0, for 3 consecutive iterations. 3.2 Laplace Importance Sampling to generate RQMC sequences Recall that the pooled estimate in (8) is based on r independent RQMC sequences u (j) 1,..., u(j) m t, j = 1,..., r, simulated from f(u y; θ (t 1) ). In this section we demonstrate how to generate randomized Halton sequences using Laplace importance sampling. Laplace importance sampling has been proven useful to draw approximate samples from f(u y; θ) in many instances (see Booth & Hobert, 1999; Kuk, 1999). Laplace importance sampling attempts to find an importance sampling distribution whose mean and variance match the mode and curvature of f(u y; θ). More specifically, suppressing the dependence on y, let l(u; θ) = log f(y, u; θ) (15) denote the complete data log likelihood and let l (u; θ) and l (u; θ) denote its first and second derivatives in u, respectively. Suppose that ũ denotes the maximizer of l satisfying l (u; θ) = 0. Then the Laplace approximations to the mean and variance of f(u y; θ) are µ(θ) = ũ and Σ(θ) = {l (ũ; θ)} 1, respectively (e.g. De Bruijn, 1958). Booth & Hobert (1999) as well as Kuk (1999) propose to use a multivariate normal or multivariate t importance sampling distribution, shifted and scaled by µ(θ) and Σ(θ), respectively. Let f Lap (u y; θ) denote the resulting Laplace importance sampling distribution. Recall that by RQMC property 1), every element of a RQMC sequence has a uniform distribution over C d. Let x k be the kth element of a randomized Halton sequence. Using a suitable transformation (e.g. Robert & Casella, 1999), we can generate a d-vector of i.i.d. normal or t variates. Shifting and scaling this vector by µ(θ) and Σ(θ) results in a draw u k from f Lap (u y; θ). Thus, using r independent randomized Halton sequences of length m t, x (j) 1,..., x(j) m t, j = 1,..., r, we obtain r independent sequences u (j) 1,..., u(j) m t from f Lap (u y; θ). Booth & Hobert (1999) or Kuk (1999) successfully use Laplace importance sampling for the fitting of generalized linear mixed models. In the following we use the method to an application of generalized linear mixed models to data exhibiting spatial correlation. 9
10 4 Application: A Geostatistical Model of Online Purchases In this section we consider sales data from an online book publisher and retailer. The publisher sells online the titles it publishes in print form as well as, more recently, also in PDF form. The publisher has good reason to believe that a customer s preference for either print or PDF form varies significantly due to his or her geographical location. In fact, since the PDF form is directly downloaded from the publisher s web site, it requires a reliable and typically fast internet connection. However, the availability of reliable internet connections varies greatly across different regions. Moreover, directly downloaded PDF files provide content immediately without having to wait for shipment as in the case of a printed book. Thus, shipping times can also influence a customer s preference. The preference can also be affected by a customer s access to good quality printers or his/her technology readiness, all of which often exhibit strong local variability. Data exhibiting spatial correlation can be modelled using generalized linear mixed models (e.g Breslow & Clayton, 1993). Diggle et al. (1998) refer to these spatial applications of generalized linear mixed models as model based geostatistics. These spatial mixed models are challenging from a computational point of view since they often involve approximating rather high dimensional integrals. In the following we consider a set of data leading to an analytically intractable likelihoodintegral of dimension 16. Let {z i } d i=1, z i = (z i1, z i2 ), denote the spatial coordinates of the observed responses {y i } d i=1. For example, z i1 and z i2 could denote the longitude and latitude of the observation y i. While y i could represent a variety of response types, we focus here on the binomial case only. For instance, y i could indicate whether or not a person living at location z i has a certain disease or whether or not this person has a preference for a certain product. One of the modelling goals is to account for the possibility that two people living in close geographic proximity are more likely to share the same disease or the same preference. Let u = (u 1,..., u d ) be a vector of random effects. Assume that, conditional on u i, the responses y i arise from the model ( ) exp(β + u i ) y i u i Binomial n i,, (16) 1 + exp(β + u i ) where β is an unknown regression coefficient. Assume furthermore that u follows a multivariate normal distribution with mean zero and covariance structure such that the correlation between two 10
11 random effects decays with the geographical distance between the associated two observations. For example, assume that Cov(u i, u j ) = σ 2 exp{ α z i z j }, (17) where denotes the Euclidian norm. While different modelling alternatives exist (see, for example, Diggle et al., 1998), we will use the above model to investigate the efficiency of Quasi-Monte Carlo MCEM implementations for estimating the parameter vector θ = (β, σ 2, α). We analyze a set of online retail data for the Washington, DC, area. Washington is a very diverse area with respect to a variety of aspects like socio-economic factors or infrastructure. This diversity is often expressed in regionally/locally strongly varying customer preferences. The data set consists of 39 customers who accessed the publisher s web site and either purchased the title in print form or in PDF. In addition to a customer s purchasing choice, the publisher also recorded the customer s geographical location. Geographical location can easily be obtained (at least approximately) through the customer s ZIP code. ZIP code information can then be transformed into longitudinal and latitudinal coordinates. After aggregating customers from the same ZIP code with the same preference, we obtained d = 16 distinct geographical locations. Let n i denote the number of purchases from location i and let y i denote the number of PDF purchases thereof. Figure 2 displays the data. Figure 2 about here Quasi-Monte Carlo has been found to improve upon the efficiency of classical Monte Carlo methods in a variety of setting. For instance, Bhat (2001) reports efficiency gains via the Halton sequence in a logit model for integral dimensions ranging from 1 to 5. Lemieux & L Ecuyer (1998), on the other hand, consider integral dimensions as large as 120 and find efficiency improvements for the pricing of Asian options. In our example, the correlation structure of the random effects in equation (17) causes the likelihood function (and therefore also the E-step of the EM algorithm) to include an analytically intractable integral of dimension 16. Indeed, the (marginal) likelihood function for the model in (16) and (17) can be written as ( ) d exp{ 0.5u T Σ 1 u} L(θ; y) f(y i u i ; θ) Σ 1/2 du, (18) where u = (u 1,..., u 16 ) T i=1 contains the random effects corresponding to the 16 distinct locations and Σ is a matrix with elements σ ij = Cov(u i, u j ). 11
12 The evaluation of high dimensional integrals is computationally burdensome. We conducted a simulation study to investigate the efficiency of QMC approaches relative to that of classical Monte Carlo. Table 1 shows the results for three different QMCEM algorithms, using r = 5, r = 10 and r = 30 RQMC sequences, respectively. This compares to an implementation of MCEM using classical Monte Carlo techniques. We can see that the Monte Carlo standard errors of the parameter estimates of θ = (β, σ 2, α) are very similar across the estimation methods, indicating that all 4 methods estimate the parameters with (on average) comparable accuracy. However, the total simulation efforts required to obtain this accuracy differs greatly. Indeed, while classical Monte Carlo requires an average number of 800,200 simulated vectors (each of dimension 16!), it only takes 20,836 for QMC (using r = 5 RQMC sequences). This is a reduction in the total simulation effort by a factor of almost 40! It is also interesting to note that among the 3 different QMC approaches, choosing r = 30 RQMC sequences results in a (average) total simulation effort of 30,997 simulated vectors compared to only 20,836 for r = 5. Table 1 about here The reduction in the total simulation effort that is possible with the use of QMC methods is intriguing. The MCEM algorithm usually spends most of its simulation effort in the final iterations when the algorithm is in the vicinity of the MLE. This has already been observed by, for example, Booth & Hobert (1999) or McCulloch (1997). The reason for this is the convergence behavior of the underlying deterministic EM algorithm. EM usually takes large steps in the early iterations, but the size of the steps reduce drastically as EM approaches ˆθ. The step size in the tth iteration of EM can be thought of as the signal that is transmitted to MCEM. However, due to the error in the Monte Carlo approximation of the E-step in (7), MCEM receives only a noisy version of that signal. While the signal-to-noise ratio is large in the early iterations of MCEM, it declines continuously as MCEM approaches ˆθ. This makes larger Monte Carlo sample sizes necessary in order to increase the accuracy of the approximation in (7) and therefore to reduce the noise. Table 1 shows that QMC methods, due to their superior ability to estimate an intractable integral more accurately, manage to reduce that noise with smaller sample sizes. The result is a smaller total simulation effort required by QMC. Table 1 also shows that among the 3 different QMCEM algorithms, implementations that use fewer but longer low-discrepancy sequences result in a better total simulation effort than a large 12
13 number of short sequences. Indeed, the simulation effort for r = 30 RQMC sequences is about 50% higher than that for r = 5 or r = 10. We pointed out in Section 2 that for a given total simulation amount r m, smaller values of r paired with larger values of m should result in a more accurate integral-estimate. On the other hand, the trade-off for using small values of r is a less accurate variance estimate in (14). In order to implement MCEM using randomized Halton sequences, a balance has to be achieved between a more accurate integral-estimate (i.e. less noise) and a more accurate variance estimate. In our example, we found this balance for values of r between 5 and 10. We also experimented with values smaller than 5 and frequently encountered problems with the numerical stability of the estimate of the covariance matrix in (14). In the final paragraphs of this section we want to take a closer look at noise of the QMCEM algorithm and compare it to classical MCEM. Figure 3 visualizes the Monte Carlo error for three different Monte Carlo estimation methods: classical Monte Carlo using random sampling (column 1), randomized Quasi-Monte Carlo with r = 5 RQMC sequences (column 2) and pure Quasi-Monte Carlo without randomization (column 3). Figure 3 about here We can see that for classical Monte Carlo, the average parameter update (thick solid line) is very volatile and has wide confidence bounds (dotted lines). This suggests that the Monte Carlo error is huge. This is in strong contrast to QMC. Indeed, for pure QMC sampling the parameter updates are significantly less volatile with much tighter confidence bounds. Notice that we allocated the same simulation effort for both simulation methods! It takes classical MCEM much larger sample sizes to reduce the noise to the same level as under QMC sampling. We have argued at the beginning of this paper that in order to implement MCEM in an automated way, the ability to estimate the error in the Monte Carlo approximation is essential. Randomized QMC methods provide this ability. While randomized Halton sequences have the lowdiscrepancy property (and thus estimate the integral with a higher accuracy than classical Monte Carlo), randomization may not come for free. Indeed, the second column of Figure 3 shows that, while the error reduction is still substantial compared to a classical Monte Carlo approach, the system is noisier than under pure QMC sampling. 13
14 5 Conclusion In this paper we have demonstrated how recent advances in randomized Quasi-Monte Carlo can be used to implement the MCEM algorithm in an automated, data-driven way. The empirical investigations provide encouraging evidence that this Quasi-Monte Carlo EM algorithm can lead to a significant efficiency gains over implementations using regular Monte Carlo methods. We focused our investigations in this work on the randomized Halton sequence only. Other randomized Quasi-Monte Carlo methods exist. See, for example, Owen (1998a) or L Ecuyer & Lemieux (2002). It could be a rewarding topic for future research to investigate the benefits of different Quasi-Monte Carlo methods for the implementation of Monte Carlo EM (and also other stochastic estimation methods that are frequently encountered in the statistics literature). Acknowledgements All the simulations made in this work are based on the programming language Ox of Doornik (2001). 14
15 References Bhat, C. (2001). Quasi-random maximum simulated likelihood estimation for the mixed multinomial logit model. Transportation Research 35, Booth, J. G. & Hobert, J. P. (1999). Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society B 61, Booth, J. G., Hobert, J. P. & Jank, W. (2001). A survey of Monte Carlo algorithms for maximizing the likelihood of a two-stage hierarchical model. Statistical Modelling 1, Bouleau, N. & Lépingle, D. (1994). Numerical Methods for Stochastic Processes. New York: Wiley. Boyles, R. A. (1983). On the convergence of the EM algorithm. Journal of the Royal Statistical Society B 45, Breslow, N. E. & Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association 88, Caflisch, R., Morokoff, W. & Owen, A. (1997). Valuation of mortgage-backed securities using brownian bridges to reduce effective dimension. Journal of Computational Finance 1, Chan, K. S. & Ledolter, J. (1995). Monte Carlo EM estimation for time series models involving counts. Journal of the American Statistical Association 90, De Bruijn, N. G. (1958). Asymptotic Methods in Analysis. Amsterdam: North-Holland. Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39, Diggle, P. J., Tawn, J. A. & Moyeed, R. A. (1998). Model-based geostatistics. Journal of the Royal Statistical Society A 47, Doornik, J. A. (2001). Ox: Object Oriented Matrix Programming. London: Timberlake. 15
16 Fang, K.-T. & Wang, Y. (1994). Number Theoretic Methods in Statistics. New York: Chapman & Hall. Faure, H. (1982). Discrépance de suites associées à un système de numération (en dimension s). Acta Arithmetica 41, Halton, J. H. (1960). On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals. Numerische Mathematik 2, Hobert, J. P. (2000). Hierarchical models: A current computational perspective. Journal of the American Statistical Association 95, Kuk, A. Y. C. (1999). Laplace importance sampling for generalized linear mixed models. Journal of Statistical Computation and Simulation 63, Lange, K. (1995). A gradient algorithm locally equivalent to the EM algorithm. Journal of the Royal Statistical Society B 57, L Ecuyer, P. & Lemieux, C. (2002). Recent advances in randomized Quasi-Monte Carlo Methods. In Modeling Uncertainty: An Examination of Stochastic Theory, Methods, and Applications, M. Dror, P. L Ecuyer & F. Szidarovszki, eds. Kluwer Academic Publishers. Lemieux, C. & L Ecuyer, P. (1998). Efficiency improvement by lattice rules for pricing asian options. In Proceedings of the 1998 Winter Simulation Conference. IEEE Press. Levine, R. & Fan, J. (2003). An automated (Markov Chain) Monte Carlo EM algorithm. Tech. rep., San Diego State University. Levine, R. A. & Casella, G. (2001). Implementations of the Monte Carlo EM algorithm. Journal of Computational and Graphical Statistics 10, McCulloch, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models. Journal of the American Statistical Association 92, Meng, X.-L. & Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80,
17 Morokoff, W. J. & Caflisch, R. E. (1995). Quasi-Monte Carlo integration. Journal of Computational Physics 122, Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Methods. Philadelphia: SIAM. Owen, A. (1998a). Scrambling Sobol and Niederreiter-Xing points. Journal of Complexity 14, Owen, A. B. (1998b). Monte Carlo extension of Quasi-Monte Carlo. In 1998 Winter Simulation Conference Proceedings. New York: Springer, pp Pagès, G. (1992). Van der Corput sequences, Kakutani transforms and one-dimensional numerical integration. Journal of Computational and Applied Mathematics 44, Robert, C. P. & Casella, G. (1999). Monte Carlo Statistical Methods. New York: Springer. Sobol, I. M. (1967). Distribution of points in a cube and approximate evaluation of integrals. U.S.S.R. Computational Mathematics and Mathematical Physics 7, Wang, X. & Hickernell, F. J. (2000). Randomized Halton sequences. Mathematical and Computer Modelling 32, Wei, G. C. G. & Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man s data augmentation algorithms. Journal of the American Statistical Association 85, Wu, C. F. J. (1983). On the convergence properties of the EM algorithm. The Annals of Statistics 11,
18 Regular Monte Carlo Quasi-Monte Carlo Figure 1: 2500 points in the unit square: The upper plot shows the result of regular Monte Carlo sampling; that is, 2500 points selected randomly. Random points tend to form clusters, oversampling the unit square in some places; this leads to gaps in other places, where the sample space is not explored at all. The lower plot shows the result of Quasi-Monte Carlo sampling: 2500 points of a two dimensional Halton sequence. 18
19 Geographical Distribution of Data Latitude Longitude Proportion of PDF Purchases per Location Proportion Latitude Longitude Figure 2: Geographical distribution of PDF purchases for Washington, DC: The upper plot shows the geographical borders of Washington, DC, as well as the geographical location of the 39 purchases of PDF or Print. The lower plot displays the geographical scatter of the relative proportion of PDF purchases. 19
20 Classical Monte Carlo Randomized Quasi Monte Carlo Pure Quasi Monte Carlo Beta Beta Beta Sigma Sigma Sigma Alpha Alpha Alpha Figure 3: Monte Carlo Error and Quasi-Monte Carlo Error: Starting MCEM near the MLE, we performed 100 iterations using a fixed Monte Carlo sample size of rm t 1000, t = 1,..., 100. We repeated this experiment 50 times for a) MCEM using classical Monte Carlo sampling (column 1); b) randomized Quasi-Monte Carlo with r = 5 (column 2); c) pure Quasi-Monte Carlo without randomization, i.e. r = 1 (column 3). For each parameter value we plotted the average of the 50 iteration histories (thick, solid lines) as well as pointwise 95% confidence bounds (dotted lines). 20
21 Table 1: Spatial model: The table investigates the efficiency of Quasi-Monte Carlo implementations of MCEM for fitting geostatistical models. We investigate three different Quasi-Monte Carlo (QMC) algorithms using r = 5, 10 and 30 independent RQMC sequences, respectively. These RQMC sequences are obtained via randomized Halton sequences using Laplace importance sampling based on a t distribution with 10 degrees of freedom. We benchmark these three QMC algorithms against an implementation of MCEM based on regular Monte Carlo (MC) sampling using the same Laplace importance sampler. We start each algorithm from (β (0), σ 2(0), α (0) ) = (0, 1, 1) and increase the length of the RQMC sequences according to Section 3.1 using α = 0.25 and κ = 0.2. The algorithm is terminated if the relative difference in two successive parameter updates falls below δ = 0.01 for 3 consecutive iterations. For each of the four MCEM implementations we performed this experiment 50 times recording the final parameter values, β i, σi 2 and α i and the total number of simulated vectors, N = T i j=1 r m j, where T i denotes the final iteration number (i = 1,..., 50). The table displays the Monte Carlo average (AVG) and the Monte Carlo standard error (SE) for these values. For instance, for the regression parameter β it displays the average β over the 50 replications and the Monte Carlo standard error s β / 50, where s β denotes the sample standard deviation over the 50 replicates. β σ 2 α N MC AVG ,200 SE ,347 QMC AVG ,836 (r=5) SE QMC AVG ,768 (r=10) SE QMC AVG ,997 (r=30) SE ,234 21
arxiv: v1 [math.st] 21 Jun 2012
IMS Collections c Institute of Mathematical Statistics, On Convergence Properties of the Monte Carlo EM Algorithm Ronald C. Neath Hunter College, City University of New York arxiv:1206.4768v1 math.st]
More informationTRB 2005: For Presentation and Publication Paper # Final Submission: April 1, 2005 Word Count: 8,191
Simulation Estimation of Mixed Discrete Choice Models Using Randomized Quasi-Monte Carlo Sequences: A Comparison of Alternative Sequences, Scrambling Methods, and Uniform-to-Normal Variate Transformation
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationQuasi-Random Simulation of Discrete Choice Models
Quasi-Random Simulation of Discrete Choice Models by Zsolt Sándor and Kenneth Train Erasmus University Rotterdam and University of California, Berkeley June 12, 2002 Abstract We describe the properties
More informationEstimating the parameters of hidden binomial trials by the EM algorithm
Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationRandomized Quasi-Monte Carlo Simulation of Markov Chains with an Ordered State Space
Randomized Quasi-Monte Carlo Simulation of Markov Chains with an Ordered State Space Pierre L Ecuyer 1, Christian Lécot 2, and Bruno Tuffin 3 1 Département d informatique et de recherche opérationnelle,
More informationNon-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically
More informationNew Global Optimization Algorithms for Model-Based Clustering
New Global Optimization Algorithms for Model-Based Clustering Jeffrey W. Heath Department of Mathematics University of Maryland, College Park, MD 7, jheath@math.umd.edu Michael C. Fu Robert H. Smith School
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationObtaining Critical Values for Test of Markov Regime Switching
University of California, Santa Barbara From the SelectedWorks of Douglas G. Steigerwald November 1, 01 Obtaining Critical Values for Test of Markov Regime Switching Douglas G Steigerwald, University of
More informationImproved Estimation and Uncertainty Quantification. Using Monte Carlo-based Optimization Algorithms
Improved Estimation and Uncertainty Quantification Using Monte Carlo-based Optimization Algorithms Cong Xu, Paul Baines and Jane-Ling Wang June 16, 2014 Abstract In this paper we present a novel method
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationLow Discrepancy Sequences in High Dimensions: How Well Are Their Projections Distributed?
Low Discrepancy Sequences in High Dimensions: How Well Are Their Projections Distributed? Xiaoqun Wang 1,2 and Ian H. Sloan 2 1 Department of Mathematical Sciences, Tsinghua University, Beijing 100084,
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationUniform Random Number Generators
JHU 553.633/433: Monte Carlo Methods J. C. Spall 25 September 2017 CHAPTER 2 RANDOM NUMBER GENERATION Motivation and criteria for generators Linear generators (e.g., linear congruential generators) Multiple
More informationPACKAGE LMest FOR LATENT MARKOV ANALYSIS
PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationGaussian Process Regression Model in Spatial Logistic Regression
Journal of Physics: Conference Series PAPER OPEN ACCESS Gaussian Process Regression Model in Spatial Logistic Regression To cite this article: A Sofro and A Oktaviarina 018 J. Phys.: Conf. Ser. 947 01005
More informationVariance and discrepancy with alternative scramblings
Variance and discrepancy with alternative scramblings ART B. OWEN Stanford University This paper analyzes some schemes for reducing the computational burden of digital scrambling. Some such schemes have
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationInferring biological dynamics Iterated filtering (IF)
Inferring biological dynamics 101 3. Iterated filtering (IF) IF originated in 2006 [6]. For plug-and-play likelihood-based inference on POMP models, there are not many alternatives. Directly estimating
More informationR Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models
R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models Christina Knudson, Ph.D. University of St. Thomas user!2017 Reviewing the Linear Model The usual linear model assumptions:
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies
More informationThe Bayesian Approach to Multi-equation Econometric Model Estimation
Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation
More informationPARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS
Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm
More informationTutorial on quasi-monte Carlo methods
Tutorial on quasi-monte Carlo methods Josef Dick School of Mathematics and Statistics, UNSW, Sydney, Australia josef.dick@unsw.edu.au Comparison: MCMC, MC, QMC Roughly speaking: Markov chain Monte Carlo
More informationA Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute
More informationCOMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY
Annales Univ Sci Budapest, Sect Comp 45 2016) 45 55 COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY Ágnes M Kovács Budapest, Hungary) Howard M Taylor Newark, DE, USA) Communicated
More informationThe Expectation Maximization Algorithm
The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-- February Abstract This note represents my attempt at explaining
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationThreshold estimation in marginal modelling of spatially-dependent non-stationary extremes
Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes Philip Jonathan Shell Technology Centre Thornton, Chester philip.jonathan@shell.com Paul Northrop University College
More informationAlternative implementations of Monte Carlo EM algorithms for likelihood inferences
Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a
More informationGlobal Convergence of Model Reference Adaptive Search for Gaussian Mixtures
Global Convergence of Model Reference Adaptive Search for Gaussian Mixtures Jeffrey W. Heath Department of Mathematics University of Maryland, College Park, MD 20742, jheath@math.umd.edu Michael C. Fu
More informationLabel Switching and Its Simple Solutions for Frequentist Mixture Models
Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching
More informationReconstruction of individual patient data for meta analysis via Bayesian approach
Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationAnalysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates
Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a
More informationLOW DISCREPANCY SEQUENCES FOR THE ESTIMATION OF MIXED LOGIT MODELS
1 LOW DISCREPANCY SEQUENCES FOR THE ESTIMATION OF MIXED LOGIT MODELS Rodrigo A. Garrido and Mariela Silva Pontificia Universidad Católica de Chile Department of Transport Engineering Tel: 56-2-686 4270,
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More informationMethod of Conditional Moments Based on Incomplete Data
, ISSN 0974-570X (Online, ISSN 0974-5718 (Print, Vol. 20; Issue No. 3; Year 2013, Copyright 2013 by CESER Publications Method of Conditional Moments Based on Incomplete Data Yan Lu 1 and Naisheng Wang
More informationSequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process
Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationHmms with variable dimension structures and extensions
Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating
More informationESUP Accept/Reject Sampling
ESUP Accept/Reject Sampling Brian Caffo Department of Biostatistics Johns Hopkins Bloomberg School of Public Health January 20, 2003 Page 1 of 32 Monte Carlo in general Page 2 of 32 Use of simulated random
More informationUsing Estimating Equations for Spatially Correlated A
Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship
More informationSTRONG TRACTABILITY OF MULTIVARIATE INTEGRATION USING QUASI MONTE CARLO ALGORITHMS
MATHEMATICS OF COMPUTATION Volume 72, Number 242, Pages 823 838 S 0025-5718(02)01440-0 Article electronically published on June 25, 2002 STRONG TRACTABILITY OF MULTIVARIATE INTEGRATION USING QUASI MONTE
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationRandomized Quasi-Monte Carlo for MCMC
Randomized Quasi-Monte Carlo for MCMC Radu Craiu 1 Christiane Lemieux 2 1 Department of Statistics, Toronto 2 Department of Statistics, Waterloo Third Workshop on Monte Carlo Methods Harvard, May 2007
More informationMonte Carlo Methods. Handbook of. University ofqueensland. Thomas Taimre. Zdravko I. Botev. Dirk P. Kroese. Universite de Montreal
Handbook of Monte Carlo Methods Dirk P. Kroese University ofqueensland Thomas Taimre University ofqueensland Zdravko I. Botev Universite de Montreal A JOHN WILEY & SONS, INC., PUBLICATION Preface Acknowledgments
More informationA Review of Pseudo-Marginal Markov Chain Monte Carlo
A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationReduction of Random Variables in Structural Reliability Analysis
Reduction of Random Variables in Structural Reliability Analysis S. Adhikari and R. S. Langley Department of Engineering University of Cambridge Trumpington Street Cambridge CB2 1PZ (U.K.) February 21,
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationTrip Generation Characteristics of Super Convenience Market Gasoline Pump Stores
Trip Generation Characteristics of Super Convenience Market Gasoline Pump Stores This article presents the findings of a study that investigated trip generation characteristics of a particular chain of
More informationFast Likelihood-Free Inference via Bayesian Optimization
Fast Likelihood-Free Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology
More informationCharles E. McCulloch Biometrics Unit and Statistics Center Cornell University
A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationPetr Volf. Model for Difference of Two Series of Poisson-like Count Data
Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like
More informationEstimating Gaussian Mixture Densities with EM A Tutorial
Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationK-ANTITHETIC VARIATES IN MONTE CARLO SIMULATION ISSN k-antithetic Variates in Monte Carlo Simulation Abdelaziz Nasroallah, pp.
K-ANTITHETIC VARIATES IN MONTE CARLO SIMULATION ABDELAZIZ NASROALLAH Abstract. Standard Monte Carlo simulation needs prohibitive time to achieve reasonable estimations. for untractable integrals (i.e.
More informationSimulated Annealing for Constrained Global Optimization
Monte Carlo Methods for Computation and Optimization Final Presentation Simulated Annealing for Constrained Global Optimization H. Edwin Romeijn & Robert L.Smith (1994) Presented by Ariel Schwartz Objective
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationLossless Online Bayesian Bagging
Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu
More informationTesting Restrictions and Comparing Models
Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by
More informationThe Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic
he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely
More informationA Review of Basic Monte Carlo Methods
A Review of Basic Monte Carlo Methods Julian Haft May 9, 2014 Introduction One of the most powerful techniques in statistical analysis developed in this past century is undoubtedly that of Monte Carlo
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationCommunications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study
Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Journal: Manuscript ID: LSSP-00-0.R Manuscript Type: Original Paper Date Submitted by the Author: -May-0 Complete List
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationSlice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method
Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationParametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in
More informationAccounting for Missing Values in Score- Driven Time-Varying Parameter Models
TI 2016-067/IV Tinbergen Institute Discussion Paper Accounting for Missing Values in Score- Driven Time-Varying Parameter Models André Lucas Anne Opschoor Julia Schaumburg Faculty of Economics and Business
More informationMCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17
MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making
More informationKernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt.
SINGAPORE SHANGHAI Vol TAIPEI - Interdisciplinary Mathematical Sciences 19 Kernel-based Approximation Methods using MATLAB Gregory Fasshauer Illinois Institute of Technology, USA Michael McCourt University
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationBayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference
Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of
More informationThe square root rule for adaptive importance sampling
The square root rule for adaptive importance sampling Art B. Owen Stanford University Yi Zhou January 2019 Abstract In adaptive importance sampling, and other contexts, we have unbiased and uncorrelated
More informationBayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling
Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation
More informationDAG models and Markov Chain Monte Carlo methods a short overview
DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex
More informationSupplementary Note on Bayesian analysis
Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan
More informationLikelihood-based inference with missing data under missing-at-random
Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric
More informationST 740: Markov Chain Monte Carlo
ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:
More informationLast lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton
EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationCollaborative topic models: motivations cont
Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationBayesian Econometrics
Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence
More information