arxiv: v1 [stat.co] 23 Oct 2007
|
|
- Sophia Day
- 5 years ago
- Views:
Transcription
1 Aaptive Importance Sampling in General Mixture Classes arxiv: v1 stat.co] 23 Oct 2007 Olivier Cappé, LTCI, ENST & CNRS, Paris Ranal Douc, INT, Evry Arnau Guillin, École Centrale & LATP, CNRS, Marseille Jean-Michel Marin, INRIA Futurs, Project select, Laboratoire e Mathématiques, Université Paris-Su & Christian P. Robert CEREMADE, Université Paris Dauphine & CREST, INSEE October 23, 2018 Abstract In this paper, we propose an aaptive algorithm that iteratively upates both the weights an component parameters of a mixture importance sampling ensity so as to optimise the importance sampling performances, as measure by an entropy criterion. The metho is shown to be applicable to a wie class of importance sampling ensities, which inclues in particular mixtures of multivariate Stuent t istributions. The performances of the propose scheme are stuie on both artificial an real examples, highlighting in particular the benefit of a novel Rao-Blackwellisation evice which can be easily incorporate in the upating scheme. Keywors: Importance Sampling mixtures, aaptive Monte Carlo, Population Monte Carlo, entropy. 1 Introuction In recent years, there has been a renewe interest in using Monte Carlo proceures base on Importance Sampling (abbreviate to IS in the following) for inference tasks. Compare to alternatives such as Markov Chain Monte Carlo methos, the main appeal of IS proceures lies in the possibility of eveloping parallel implementations, which becomes more an more important with the generalisation of multiple core machines an computer clusters. Importance sampling proceures are also attractive in that they allow for an easy assessment of the Monte Carlo error an, correlatively, for the evelopment of learning mechanisms. In many applications, the fact that IS proceures may be tune by choosing an appropriate IS ensity to minimise the approximation error for a specific function of interest is also crucial. On the other han, the shortcomings of IS approaches are also well-known, incluing a poor scaling to highly multiimensional problems an an acute sensitivity to This work has been supporte by the Agence Nationale e la Recherche (ANR, 212, rue e Bercy Paris) through the project Aap MC. Both last authors are grateful to the participants to the BIRS 07w5079 meeting on Bioinformatics, Genetics an Stochastic Computation: Briging the Gap, Banff, for their helpful comments. The last author also acknowleges an helpful iscussion with Geoff McLachlan. Corresponing author 1
2 the choice of the IS ensity combine with the fact that it is impossible to come up with a universally efficient IS ensity. Aaptive Monte Carlo is a natural solution to remey for the latter class of ifficulties by graually improving the IS ensity base on some form of Monte Carlo approximation. While there exist a wie variety of solutions in the literature (see, e.g. Robert an Casella, 2004, Chapter 14), this paper concentrates on the construction of iterate importance sampling schemes or population Monte Carlo. Population Monte Carlo (or PMC) was introuce by Cappé et al. (2004) as an repeate Sampling Importance Resampling (SIR) proceure: once a sample (X 1,...,X N ) is prouce by SIR, it provies an approximation to the target istribution π an can be use as a stepping stone towars a better approximation to π. More precisely, if (X 1,...,X N ) is a sample approximately istribute from π, it may be perturbe stochastically using an arbitrary Markov transition kernel q(x,x ) so as to prouce new sample (X 1,...,X N ). Conucting a resampling step base on the IS weights ω i = π(x i )/q(x i,x i ), we will then prouce a new sample ( X 1,..., X N ) that also constitutes an approximation to the target istribution π. Repeating this scheme in an iterative manner is however only of interest if samples that have been previously simulate are use to upate (or aapt) the kernel q(x,x ), in the sense that keeping the same kernel q over iterations oes not moify the statistical properties of the sample prouce at each iteration an, therefore, reuces the efficiency of the approximation by introucing extra Monte Carlo variance. Failing to improve upon the choice of the kernel q thus cancels the appeal of using several iterations, when compare with one single IS raw with the same total sample size (see Douc et al., 2007a). Population Monte Carlo therefore requires an upating scheme that takes avantage of previously generate samples so that it improves the choice of the IS transition kernel against a given measure of efficiency. In the approach avocate by Douc et al. (2007a), one consiers a transition kernel q consisting of a mixture of fixe transition kernels q α (x,x ) = D α q (x,x ), =1 D α = 1, (1) whose weights α 1,...,α D are tune aaptively. The propose aaptation proceure aims at minimising the eviance or entropy criterion between the kernel q α an the target π, =1 E(π,q α ) = E X π D(π q α (X, ))], (2) where D(p q) = log{p(x)/q(x)}p(x)x enotes the Kullback-Leibler ivergence (also calle relative entropy), an where the expectation is taken uner the target istribution X π since the kernels q (x,x ) epen on the starting value x. In the sequel, we refer to the criterion in (2) as the entropy criterion as it is obviously relate to the performance measure use in the cross-entropy metho of Rubinstein an Kroese (2004). In Douc et al. (2007b), a version of this algorithm was evelope to minimise the asymptotic variance of the IS proceure, for a specific function of interest, in lieu of the entropy criterion. A major limitation in the approaches of both Douc et al. (2007a,b) is that the proposal kernels q remain fixe over the iterative process while only the mixture weights α get improve. In the present contribution, we remove this limitation by extening the framework of Douc et al. (2007a) to allow for the aaption of IS ensities of the form q (α,θ) (x) = D α q (x;θ ), (3) =1 with respect to both the weights α an the internal parameters θ of the component ensities. In theory, as explaine through the example consiere in Section 4, the propose aaptive scheme, 2
3 which relies on an integrate EM upate mechanism, is applicable to more general families of latentata IS ensities. This propose extension an, in particular, the introuction of (multiimensional) scaling parameters raises challenging robustness issues for which we propose a Rao-Blackwellisation scheme that empirically appears to be very efficient while inucing a moest aitional algorithmic complexity. Note that we consier here the generic entropy criterion of Douc et al. (2007a) rather than the function-specific variance minimisation objective of Douc et al. (2007b). This choice is motivate by the recognition that in most applications, the IS ensity is expecte to perform well for a range of typical functions of interest rather than for a specific target function h. In aition, the generalisation of the approach of Douc et al. (2007b) to a class of mixture IS ensities that are parameterise by more than the weights remains an open question (see also Section 5). A secon remark is that in contrast to the previously cite works, we consier in this paper only global inepenent IS ensities of the form given in (3). Thus the propose scheme really is an iterate importance sampling scheme, contrary to what happens when using more general IS transition kernels as in(1). Obviously, resorting to moves that epen on the current sample is initially attractive because it allows for some local moves as oppose to the global exploration provie by inepenent IS ensities. However, the fact that the entropy criterion in (2) is a global measure of fit tens to moify the parameters of each transition kernel epening on its average performance over the whole sample, rather than locally. In aition, structurally imposing a epenence on the points sample at the previous iteration inuces some extra-variability which can be etrimental when more parameters are to be estimate. The paper is organise as follows: In Section 2, we evelop a generic upating scheme for inepenent IS mixtures (3), establishing that the integrate EM argument of Douc et al. (2007a) remains vali in our setting. Note once again that the integrate EM upate mechanism we uncover in this paper is applicable to all missing ata representations of the proposal kernel, an not only to finite mixtures. In Section 3, we consier the case of Gaussian mixtures which naturally exten the case of mixtures of Gaussian ranom walks with fixe covariance structure consiere in Douc et al. (2007a,b). In Section 4, we show that the algorithm also applies to mixture of multivariate t istributions with the continuous scale mixing representation use in Peel an McLachlan (2000). Section 5 provies some conclusive remarks about the performances of this approach as well as possible extensions. 2 Upating the IS ensity 2.1 Entropies an perplexity When consiering inepenent mixture IS ensities of the form (3), the entropy criterion E efine in (2) reuces to the Kullback-Leibler ivergence between the target ensity π an the mixture q (α,θ) : E(π,q (α,θ) ) = D(π q (α,θ) ) = ( ) π(x) log D =1 α π(x)x. (4) q (x;θ ) As usual in applications of the IS approach to Bayesian inference, the target ensity π is known up to a normalisation constant only an we will focus on the self-normalise version of IS which only requires the knowlege of an unnormalise version π unn of π (Geweke, 1989). As a sie comment, note that while E(π,q (α,θ) ) is a convex function of the weights α 1,...,α D (Douc et al., 2007a), it is generally not so when also optimising with respect to the component parameters θ 1,...,θ D. It is well known that if one consiers a function h of interest, the self-normalise IS estimation 3
4 of its expectation π(h) = ω ih(x i ), where ω i = (π(x i )/q (α,θ) (X i )) / ( j=1 π(x j)/q (α,θ) (X j )) an X i q (α,θ), has an asymptotic variance of υ(h) = {h(x) π(h)} 2 π 2 (x)/q α,θ (x)x, assuming that (1 + h 2 (x))π 2 (x)/q α,θ (x)x <. In aition, this asymptotic variance may be consistently estimate using the IS sample itself as N ω2 i {h(x i) π(h)} 2 (Geweke, 1989). Obviously, for a given function h, there is no irect link between υ(h) an the entropy criterion in (4), a fact that motivate the work of Douc et al. (2007b). However it is easily shown that sup υ(h) = M 2 π 2 (x)/q (α,θ) (x)x, {h: h π(h) M} where the latter integral term is lower an upper boune by 1 an exp E(π,q (α,θ) ) ] respectively, by irect applications of Jensen s inequality. Hence minimising E(π,q (α,θ) ) inee reuces the worst case performance of the IS approach, at least for boune functions. In aition, rewriting exp E(π,q (α,θ) ) ] ( = exp log π unn(x) q (α,θ) (x) π(x)x an estimating the first integral by self-normalise IS as )( ) π unn (x)x an the secon one by classical IS, as N ω i log π nn(x i ) q (α,θ) (X i ) 1/N N π nn (X i )/q (α,θ) (X i ), shows that exp(h N )/N, where H N = ω ilog ω i is the Shannon entropy of the normalise IS weights, is an estimator of the inverse of the term exp E(π,q (α,θ) ) ]. Thus, minimisation of the entropy criterion is connecte with the maximisation of exp(h N )/N, were H N is the entropy of the IS weights, a frequently use criterion for assessing the quality of an IS sample together with the socalle Effective Sample Size (ESS) criterion (Chen an Liu, 1996, Doucet et al., 2001, Cappé et al., 2005). In the following, we refer to exp(h N )/N as the normalise perplexity of the IS weights, following the terminology in use in the fiel of natural language processing. 2.2 Integrate upates Let α t = ( α t 1,...,αt D) an θ t = ( θ t 1,...,θt D) enote, respectively, the mixture weights an the component parameters at the t-th iteration of the algorithm (where t = 1,...,T). In orer to upate the parameters (α t,θ t ) of the inepenent IS ensity (3), we will take avantage of the latent variable structure that unerlines the objective function(4). The resulting algorithm still theoretical at this stage as it involves integration with respect to π may be interprete as an integrate EM (Expectation-Maximisation) scheme that we now escribe. Given that minimising (4) in (α, θ) is equivalent to maximising ( D ) log α q (x;θ ) π(x)x, =1 4
5 we are facing a task that formally resembles stanar mixture maximum likelihoo estimation but with an integration with respect to π replacing the empirical sum over observations. As usual in mixtures, the latent variable Z is the component inicator, with values in {1,...,D} such that the joint ensity f of x an z satisfies f(z) = α z an f(x z) = q z (x;θ z ), which prouces (3) as the marginal in x. At iteration t of our algorithm, we can therefore take avantage of this latent variable representation by consiering the expecte complete log-likelihoo ] E Z (α t,θ t ) {log(α Zq Z (X;θ Z )) X}, E X π where the inner expectation is compute uner the conitional istribution of Z for the current value of the parameters, (α t,θ t ), i.e. / D f(z x) = α t z q z(x;θz t ) α t q (x;θ t ). Theupatingmechanisminouralgorithmthencorresponstosettingthenewparameters(α t+1,θ t+1 ) equal to ] (α t+1,θ t+1 ) = argmax (α,θ) EX π E Z (α t,θ t ) {log(α Zq Z (X;θ Z )) X}, as in a regular EM estimation of the parameters of a mixture, except for the extra expectation over X. The convexity argument that shows that EM increases the objective function at each step also apply in this setup. Solving the maximisation program, we have (α t+1,θ t+1 ) = argmax (α,θ) If we efine g 1 (α) = E X π Therefore, setting we nee to solve ( E X π E Z (α t,θ t ) {log(α Z) X} ] E Z (α t,θ t ) {log(α Z) X} =1 ] +E X π an g 2 (θ) = E X π ( (α t+1,θ t+1 ) = argmax (g 1(α)+g 2 (θ)) = arg max (α,θ) an, for {1,...,D}, we obtain E Z (α t,θ t ) {log(q Z(X;θ Z )) X} ]). ] E Z (α t,θ t ) {log(q Z(X;θ Z )) X}, we get α g 1(α),argmax θ ) g 2 (θ). / D ρ (X;α,θ) = α q (X;θ ) α l q l (X;θ l ), (5) α t+1 = argmax α EX π l=1 D ] ρ (X;α t,θ t )log(α ), =1 α t+1 = E X π ρ (X;α t,θ t ) ]. (6) Similarly, an, for {1,...,D}, θ t+1 = argmaxe X π θ θ t+1 = argmax θ E X π D ] ρ (X;α t,θ t )log(q (X;θ )), =1 ρ (X;α t,θ t )log(q (X;θ )) ]. (7) 5
6 As in the regular mixture estimation problem, the resolution of this maximisation program ultimately epens on the shape of the ensity q. If q belongs to an exponential family, it is easy to erive a close-form solution, which however involves expectations uner π. Section 3 provies an illustration of this fact in the Gaussian case, while the non-exponential Stuent s t case is consiere in Section Approximate upates As argue before, the aaptivity of the propose proceure is achieve by upating the parameters base on the previously simulate sample. We thus start the PMC algorithm by arbitrarily fixing the mixture parameters (α 0,θ 0 ) an we then sample from the resulting proposal α 0 q (x;θ 0) to obtain our initial sample (X i,0 ) 1 i N, associate with the latent variables (Z i,0 ) 1 i N that inicate from which component of the mixture the corresponing (X i,0 ) 1 i N have been generate. From this stage, we procee recursively. Starting at time t from a sample (X i,t ) 1 i N, associate with (Z i,t ) 1 i N an with (α t,n,θ t,n ), we enote by ω i,t the normalise importance weights of the sample point X i,t : / π(x i,t ) ω i,t = D =1 αt,n q (X i,t ;θ t,n ) j=1 π(x j,t ) D =1 αt,n q (X j,t ;θ t,n ). (8) To approximate (6) an (7), Douc et al. (2007a) propose the following upate rule: α t+1,n = θ t+1,n N ω i,t ½{Z i,t = }, = argmax θ ( )} ω i,t ½{Z i,t = }log {q ] X i,t ;θ t,n. (9) The computational cost of this upate is of orer N whatever the number D of components is, since the weight an the parameter of each component are upate base only on the points that were actually generate from this component. However, this observation also suggests that (9) may be highly variable when N is small an/or D becomes larger. To make the upate more robust, we propose a simple Rao-Blackwellisation step that consists in replacing ½{Z i,t = } with its conitional ( expectation given X i,t, that is, ρ X i,t ;α t,n,θ t,n α t+1,n = θ t+1,n N = argmax θ ) : ( ) ω i,t ρ X i,t ;α t,n,θ t,n, ( ) ( )} ω i,t ρ X i,t ;α t,n,θ t,n log {q ] X i,t ;θ t,n. (10) Examining (5) inicates why the evaluation of the posterior probabilities ρ (X i,t ;α t,n ) oes not represent a significant aitional computation cost in the PMC scheme, given that the enominator of this expression has alreay been compute when evaluating the IS weights accoring to (8). The most significant ifference between an (9) an (10) is that, with the latter, all points contribute to the upating of the -th component, for an overall cost proportional to D N. Note however that in many applications of interest, the most significant computational cost is associate with the evaluation of π so that the cost of the upate is mostly negligible, even with the Rao-Blackwellise version.,θ t,n Convergence of the estimate upate parameters as N increases can be establishe using the same approach as in Douc et al. (2007a,b), relying mainly on the convergence property of triangular arrays 6
7 of ranom variables (see Theorem A.1 in Douc et al., 2007a). For the Rao-Blackwellise version, assuming that for all θ s, π(q ( ;θ ) = 0) = 0, for all α s an θ s, ρ ( ;α,θ)logq (,θ ) L 1 (π), an, some (uniform in x) regularity conitions on q (x;θ) viewe as a function of θ, yiel α t+1,n P α t+1, θ t+1,n P θ t+1 when N goes to infinity. Note that we o not expan on the regularity conitions impose on q since, for the algorithm to be efficient, we efinitely nee a close-form expression on the parameter upates. It is then easier to eal with the convergence of the approximation of these upate formulas on a case-by-case basis, as will be seen for instance in the following Gaussian example. 3 The Gaussian mixture case As a first example, we consier the case of p-imensional Gaussian mixture IS ensities of the form { q (X;θ ) = {(2π) p Σ } 1/2 exp 1 } 2 (X µ ) T Σ 1 (X µ ), where θ = (µ,σ ) enotes the parameters of the -th Gaussian component ensity. This parametrisation of the IS ensity provies a general framework for approximating multivariate targets π an the corresponing aaptive algorithm is a straightforwar instance of the general framework iscusse in the previous section. 3.1 Upate formulas The integrate upate formulas are obtaine as the solution of θ t+1,n = argmin θ E X π ρ (X;α t,θ t ) ( log Σ +(X µ ) T Σ 1 (X µ ) )]. It is straightforwar to check that the infimum is reache when, for {1,...,D}, ρ (X;α t,θ t )X ] µ t+1 = EX π E X π ρ (X;α t,θ t )], an Σ t+1 = EX π ρ (X;α t,θ t )(X µ t+1 )(X µ t+1 ) T] E X π ρ (X;α t,θ t. )] At iteration t+1 of the PMC algorithm, both the numerator an the enominator of each of the above expressions are approximate using self-normalise importance sampling, yieling the following empirical upate equations for the basic upating strategy α t+1,n = µ t+1,n = Σ t+1,n = N ω i,t ½{Z i,t = }, ω i,tx i,t ½{Z i,t = } ω, (11) i,t½{z i,t = } ω i,t(x i,t µ t+1,n ω i,t½{z i,t = } )(X i,t µ t+1,n ) T ½{Z i,t = }, 7
8 an α t+1,n = µ t+1,n = Σ t+1,n = N ω i,t ρ (X i,t ;α t,n,θ t,n ), ω i,tx i,t ρ (X i,t ;α t,n,θ t,n ) ω, (12) i,tρ (X i,t ;α t,n,θ t,n ) )( ) Tρ N ω i,t(x i,t µ t+1,n X i,t µ t+1,n (X i,t ;α t,n,θ t,n ) ω, i,tρ (X i,t ;α t,n,θ t,n ) for the Rao-Blackwellise scheme. Note that as iscusse at the en of Section 2, one observes that in the Gaussian case the convergence of the parameter upate can be establishe by assuming only that ρ (x;α,θ)x 2 is integrable with respect to π. 3.2 A simulation experiment To illustrate the results of the algorithm presente above, we consier a toy example in which the target ensity consists of a mixture of two multivariate Gaussian ensities. The appeal of this example is that it is sufficiently simple to allow for an explicit characterisation of the attractive points for the aaptive proceure, while still illustrating the variety of situations foun in more realistic applications. In particular, the moel contains an attractive point that oes not correspon to the global minimum of the entropy criterion as well as some regions of attraction that can eventually lea to a failure of the algorithm. The results obtaine on this example also illustrate the improvement brought by the Rao-Blackwellise upate formulas in (12). The target π is a mixture of two p-imensional Gaussian ensities such that π(x) = 0.5N(x; su p,i p )+0.5N(x;su p,i p ), when u p is the p-imensional vector whose coorinates are equal to 1 an I p stans for the ientity matrix. In the sequel, we focus on the case where p = 10 an s = 2. Note that one shoul not be misle by the image given by the marginal ensities of π: in the ten imensional space, the two components of π are inee very far from one another. It is for instance straightforwar to check that the Kullback-Leibler ivergence between the two components of π, D{N( su p,i p ) N( su p,i p )}, is equal to 1 2 2su p 2 = 2s 2 p, that is 80 in the case uner consieration. In particular, if we were to use one of the components of the mixture as an IS ensity for the other, we know from the arguments expose at the beginning of Section 2 that the normalise perplexity of the weight will eventually ten to exp( 80). This number is so small, that for any feasible sample size, using one of the component ensities of π as an IS instrumental ensity for the other component or even for π itself can only provie useless biase estimates. The initial IS ensity q 0 is chosen here as the isotropic ten-imensional Gaussian ensity with a covariance matrix of 5I p. The performances of q 0 as an importance sampling ensity, when compare to various other alternatives, are fully etaile in Table 1 below but the general comment is that it correspons to a poor initial guess which woul provie highly variable results when use with any sample size uner 50, 000. Inaition to figures relate to the initial IS ensity q 0, Table 1 also reports performanceobtaine with the best fitting Gaussian IS ensity (with respect to the entropy criterion), which is straightforwarly obtaine as the centre Gaussian ensity whose covariance matrix matches the one of π, that is, I p +s 2 u p u T p. Of course the best possible performances achievable with a mixture of two Gaussian ensities, always with the entropy criterion, is obtaine when using π as an IS ensity (secon line 8
9 Proposal N-PERP N-ESS σ 2 (x 1 ) q 0 6.5E-4 1.5E-4 37E3 Best fitting Gaussian Target mixture Best fitting Gaussian (efensive option) Best fitting two Gaussian mixture (efensive option) Table 1: Performance of various importance sampling ensities in terms of N-PERP: Normalise perplexity; N-ESS: Normalise Effective Sample Size; σ 2 (x 1 ): Asymptotic variance of self-normalise IS estimator for the coorinate projection function h(x) = x 1. Quantities marke with a agger sign are straightforwar to etermine, all others have been obtaine using IS with a sample size of one million. of Table 1). Finally both final lines of Table 1 report the best fit obtaine with IS ensities of the form 0.9 D =1 α N(µ,Σ ) + 0.1q 0 ( ) when, respectively, D = 1 an D = 2 (further comments on the use of these are given below). As a general comment on Table 1, note that the variations of the perplexity of the IS weights, of the ESS an of the asymptotic variance of the IS estimate for the coorinate projection function are very correlate. This is a phenomenon that we have observe on many examples an which justifies our postulate that minimising the entropy criterion oes provie very significant variance reuctions for the IS estimate of typical functions of interest. In this example, one may categorise the possible outcomes of aaptive IS algorithms base on mixtures of Gaussian IS ensities into mostly four situations: Disastrous (D.) After T iterations of the PMC scheme, q (α T,θ T ) is not a vali IS ensity an may lea to inconsistent estimates. Typically, this may happen if q (α T,θ T ) becomes much too peaky with light tails. As iscusse above, it will also practically be the case if the algorithm only succees in fitting q (α T,θ T ) to one of both Gaussian moes of π. Another isastrous outcome is when the irect application of the aaptation rules escribe above leas to numerical problems, usually ue to the poor conitioning of some of the covariance matrices Σ. Rather than fixing these issues by a-hoc solutions (eg. iagonal loaing), which coul nonetheless be useful in practical applications, we consier below more principle ways of making the algorithm more resistant to such failures. Meiocre (M.) After aaptation, q (α T,θ T ) is not significantly better than q 0 in terms of the performance criteria isplaye in Table 1 an, in this case, the aaptation is useless. Goo (G.) After T iterations, q (α T,θ T ) selects the best fitting Gaussian approximation (secon line of Table 1) which alreay provies a very substantial improvement as it results in variance reuctions by about four orers of magnitue for typical functions of interest. Excellent (E.) After T iterations, q (α T,θ T ) selects the best fitting mixture of two Gaussian ensities, which in this somewhat artificial example correspons to a perfect fit of π. Note, however that, the actual gain over the previous outcome is rather moerate with a reuction of variance by a factor less than four. Of course, a very important parameter here is the IS sample size N: for a given initial IS ensity q 0, if N is too small, any metho base on IS is boun to fail, conversely when N gets large all reasonable algorithms are expecte to reach either the G. or E. result. Note that with local aaptive rules such as the ones propose in this paper, it is not possible to guarantee that only the E. outcome will be achieve as the best fitting Gaussian IS ensity is inee a stationary point (an in fact a 9
10 local minima) of the entropy criterion. So, epening on the initialisation, there always is a non zero probability that the algorithm converges to the G. situation only. To focus on situations where algorithmic robustness is an issue, we purposely chose to select a rather small IS sample size of N = 5,000 points. As iscusse above, irect IS estimates using q 0 as IS ensity woul be mostly useless with such a moest sample size. We evaluate four algorithmic versions of the PMC algorithm. The first, Plain PMC, uses the parameter upate formulas in (11) an q 0 is only use as an initialisation value, which is common to all D components of the mixture (which also initially have equal weights). Only the means of the components are slightly perturbe to make it possible for the aaptation proceure to actually provie istinct mixture components. One rawback of the plain PMC approach is that we o not ensure uring the course of the algorithm that the aapte mixture IS ensity remains vali, in particular that it provies reliable estimates of the parameter upate formulas. To guarantee that the IS weights stay well behave, we consier a version of the PMC algorithm in which the IS ensity is of the form (1 α 0 ) D α N(µ,Σ )+α 0 q 0 =1 with the ifference that α 0 is a fixe parameter which is not aapte. The aim of this version, which we call Defensive PMC in reference to the work of Hesterberg (1995), is to guarantee that the importance function remains boune by α 1 0 π(x)/q 0(x), whatever happens uring the aaptation, thus guaranteeing a finite variance. Since q 0 is a poor IS ensity, it is preferable to keep α 0 as low as possible an we use α 0 = 0.1 in all the following simulations. As etaile in both last lines of Table 1, this moification will typically slightly limit the performances achievable by the aaptation proceure, although this rawback coul probably be avoie by allowing for a ecrease of α 0 uring the iterations of the PMC. The parameter upate formulas for this moifie mixture moel are very easily euce from (11) an are omitte here for the sake of conciseness. The thir version we consiere is terme Rao-Blackwellise PMC an consists in replacing the upate equations (11) by their Rao-Blackwellise version (12). Finally, we consier a fourth option in which both the efensive mixture ensity an the Rao-Blackwellise upate formulas are use. All simulations were carrie out using a sample size of N = 5,000, 20 iterations of the PMC algorithm an Gaussian mixtures with D = 3 components. Note that we purposely avoie to chose D = 2 to avoi the very artificial perfect fit phenomenon. This also means that for most runs of the algorithm, at least one component will isappear (by convergence of its weight to zero) or will be uplicate, with several components sharing very similar parameters. Disastrous Meiocre Goo Excellent Plain Defensive R.-B Defensive + R.-B Table 2: Number of outcomes of each category for the four algorithmic versions, as recore from 100 inepenent runs. Table 2 isplay the performances of the four algorithms in repeate inepenent aaptation runs. The most significant observation about Table 2 is the large gap in robustness between the non Rao- Blackwellise versions of the algorithm, which returne isastrous or meiocre results in about 60% of the cases, a fraction that falls bellow 20% when the Rao-Blackwellise upate formulas are use. Obviously the fact that the Rao-Blackwellise upates are base on all simulate values an not just on those actually simulate from a particular mixture component is a major source of robustness 10
11 of the metho when the sample size N is small, given the misfit of the initial IS ensity q 0. The same remark also applies when the PMC algorithm is to be implemente with a large number D of components. The role of the efensive mixture component is more moest although it oes improve the performances of both versions of the algorithm (non Rao-Blackwellise an Rao-Blackwellise altogether), at the price of a slight reuction of the frequency of the Excellent outcome. Also notice that the results obtaine when the efensive mixture component is use are slightly beyon those of the unconstraine aaptation (see Table 1). The frequency of the perfect or Excellent match is about 10% for all methos but this is a consequence of the local nature of the aaptation rule as well as of the choice of the initialisation of the algorithm. It shoul be stresse however that as we are not intereste in moelling π by a mixture but rather that we are seeking goo IS ensities, the solutions obtaine in the G. or E. situations are only milly ifferent in this respect (see Table 1). As a final comment, recall that the results presente above have been obtaine with a fairly small sample size of N = 5,000. Increasing N quickly reuces the failure rate of all algorithms: for N = 20,000 for instance, the failure rate of the plain PMC algorithm rops to 7/100 while the Rao-Blackwellise versions achieve either the G. or E. result (an mostly the G. one, given the chosen initialisation) for all runs. 4 Robustification via mixtures of multivariate t s We now consier the setting of a proposal compose of a mixture of p-imensional t istributions, D α T (ν,µ,σ ). (13) =1 We here follow the recommenations of West (1992) an Oh an Berger (1993) who propose using mixtures of t istributions in importance sampling. The t mixture is preferable to a normal mixture because of its heavier tails that can capture a wier range of non-gaussian targets with a smaller number of components. This alternative setting is more challenging however an one must take avantage of the missing variable representation of the t istribution itself to achieve a close-form upating of the parameters (µ,σ ) approximating (7), since a true close-form cannot be erive. 4.1 The latent-ata framework Using the classical normal/chi-square ecomposition of the t istribution, a joint istribution associate with the t mixture proposal (13) is f(x,y,z) α z Σ z 1/2 exp { (x µ z ) T Σ 1 z (x µ z )y/2ν z } y (ν z+p)/2 1 e y/2 α z ϕ(x;µ z,ν z Σ z /y)ς(y;ν z /2,1/2), where, as above, x correspons to the observable in (13), z correspons to the mixture inicator, an y correspons to the χ 2 ν completion. The normal ensity is enote by ϕ an the gamma ensity by ς. Both y an z correspon to latent variables in that the integral of the above in (y,z) returns (13). In the associate PMC algorithm, we only upate the expectations an the covariance structures of the t istributions an not the number of egrees of freeom, given that there is no close-form solution for the later. In that case, θ = (µ,σ ) an, for each = 1,...,D, the number of egrees of freeom ν is fixe. At iteration t, the integrate EM upate of the parameter will involve the following E function ] Q{(α t,θ t ),(α,θ)} = E X π E Y,Z (α t,θ t ) {log(α Z)+log(ϕ(X;µ Z,ν Z Σ z /Y)) X}, 11
12 since the χ 2 part oes not involve the parameter θ = (µ,σ). Given that Y,Z X,θ f(y,z x) α z ϕ(x;µ z,ν z Σ z /y)ς(y;ν z /2,1/2), we have that Y X,Z =,θ Ga (ν +p)/2, 1 { 1+(X µ ) T Σ 1 2 (X µ } ] )/ν an therefore ] Q{(α t,θ t ),(α,θ)} = E X π E Z (α t,θ t ) {log(α Z) X} 1 { 2 EX π E Y,Z (α t,θ t ) log Σ Z + (X µ Z) T Σ 1 }] Z (X µ Z )Y ν Z X D ] = E X π ρ (X;α t,θ t )log(α ) =1 1 2 EX π D =1 where we have use both the efinition in (5), { ρ (X;α t,θ t ) log Σ +(X µ ) T Σ 1 (X µ ) } ] ν +p ν +(X µ t )T (Σ t ) 1 (X µ t ), ρ (X;α t,θ t ) = P α t,θ t(z = X) = αt t(x;ν,µ t,σt ) D l=1 αt l t(x;ν l,µ t l,σt l ), with t(x;ν,µ,σ) enoting the T (ν,µ,σ) ensity, an the fact that γ (X;θ t ) = E Y θ t {Y/ν X,Z = } = Therefore, the M step of the integrate EM upate is α t+1 = E X π ρ (X;α t,θ t ) ] ρ (X;α t,θ t )γ (X;θ t )X ] µ t+1 Σ t+1 = EX π = EX π ν +p ν +(X µ t )T (Σ t ) 1 (X µ t ). E X π ρ (X;α t,θ t )γ (X;θ t )] ρ (X;α t,θ t )γ (X;θ t )(X µ t+1 )(X µ t+1 ) T] E X π ρ (X;α t,θ t. )] While the first upate is the generic weight moification (6), the latter formulae are (up to the integration with respect to X) essentially those foun in Peel an McLachlan (2000) for a mixture of t istributions. 4.2 Parameter upate As in Section 3.1, the empirical upate equations are obtaine by using self-normalise IS with weights ω i,t given by (8) for both the numerator an the enominator of each of the above expressions. The 12
13 Rao-Blackwellise approximation base on (10) yiels N α t+1,n = ω i,t ρ (X i,t ;α t,n,θ t,n ), µ t+1,n = Σ t+1,n = ω i,tρ (X i,t ;α t,n,θ t,n )γ (X i,t ;θ t,n )X i,t ω, i,tρ (X i,t ;α t,n,θ t,n )γ (X i,t ;θ t,n ) ω i,tρ (X i,t ;α t,n,θ t,n )γ (X i,t ;θ t,n )(X i,t µ t+1,n ω i,tρ (X i,t ;α t,n,θ t,n ) )(X i,t µ t+1,n ) T while the stanar upate equations, base on (9), are obtaine by replacing ρ (X i,t ;α t,n,θ t,n ) by ½{X i,t = } in the above equations., 4.3 Pima Inian example As a realistic if artificial illustration of the performances of the t mixture (13), we stuy the posterior istribution of the parameters of a probit moel. The corresponing ataset is borrowe from the MASS library of R (R Development Core Team, 2006). It consists in the recors of 532 Pima Inian women who were teste by the U.S. National Institute of Diabetes an Digestive an Kiney Diseases for iabetes. Four quantitative covariates were recore, along with the presence or absence of iabetes. The corresponing probit moel analyses the presence of iabetes, i.e. P β (y = 1 x) = 1 P β (y = 0 x) = Φ(β 0 +x T (β 1,β 2,β 3,β 4 )) with β = (β 0,...,β 4 ), x mae of four covariates, the number of pregnancies, the plasma glucose concentration, the boy mass inex weight in kg/(height in m) 2, an the age, an Φ correspons to the cumulative istribution function of the stanar normal. We use the flat prior istribution π(β X) 1; in that case, the 5-imensional target posterior istribution is such that 532 π(β y,x) Φ{β0 +(x i ) T (β 1,β 2,β 3,β 4 )} ] y i 1 Φ{β0 +(x i ) T (β 1,β 2,β 3,β 4 )} 1 y i where x i is the value of the covariates for the i-th iniviuals an y i is the response of the i-th iniviuals. We first present some results for N = 10,000 sample points an T = 500 iterations on Figures 1 3, base on a mixture with 4 components an with the egrees of freeom chosen as ν = (3,6,9,18), respectively, when using the non Rao-Blackwellise version (9). The unrealistic value of T is chosen purposely to illustrate the lack of stability of the upate strategy when not using the Rao- Blackwellise version. Inee, as can be seen from Figure 1, which escribes the evolution of the µ s, some components vary quite wiely over iterations, but they also correspon to a rather stable overall estimate of β, N ω i,t β i,t, equal to ( 5.54, 0.051, 0.019, 0.055, 0.022) over most iterations. When looking at Figure 3, the quasiconstant entropy estimate after iteration 100 or so shows that, even in this situation, there is little nee to perpetuate the iterations till the 500-th. Using a Rao-Blackwellise version of the upates shows a strong stabilisation for the upates of the parameters α an (µ,σ ), both in the number of iterations an in the range of the parameters. The approximation to the Bayes estimate is obviously very close to the above estimation ( 5.63, 0.052, 0.019, 0.056, 0.022). Figures 4 an 5 show the immeiate stabilisation provie by the Rao-Blackwellisation step. 13
14 µ µ µ 1 µ 3 µ µ µ 4 µ 2 Figure 1: Pima Inians: Evolution of the components of the five µ s over 500 iterations plotte by pairs: (clockwise from upper left sie) (1,2), (3,4), (4,1) an (2,3). The colour coe is blue for µ 1, yellow for µ 2, brown for µ 3 an re for µ 4. The aitional ark path correspons to the estimate of β. All µ s were starte in the vicinity of the MLE ˆβ. 14
15 σ σ e e e e σ 11 σ 22 σ 44 0e+00 2e 04 4e 04 6e e e e e 05 σ e+00 2e 04 4e 04 6e 04 σ 33 σ 44 Figure 2: Pima Inians: Evolution of the five Σ s over 500 iterations plotte by pairs for the iagonal elements: (clockwise from upper left sie) (1,2), (3,4), (4,1) an (2,3). The colour coe is blue for Σ 1, yellow for Σ 2, brown for Σ 3 an re for Σ 4. All Σ s were starte at the covariance matrix of ˆβ prouce by R glm() proceure. 15
16 p i t t 2.0e e e+00 Figure 3: Pima Inians: Evolution of the cumulate weights (top) an of the estimate entropy ivergence E π log(q α,θ (β))] (bottom). 16
17 µ µ µ 1 µ 3 µ µ µ 4 µ 2 Figure 4: Pima Inians: Evolution of the components of the five µ s over 50 Rao-Blackwellise iterations plotte by pairs: (clockwise from upper left sie) (1,2), (3,4), (4,1) an (2,3). The colour coe is blue for µ 1, yellow for µ 2, brown for µ 3 an re for µ 4. The aitional ark path correspons to the estimate of β. All µ s were starte in the vicinity of the MLE ˆβ. 17
18 p i t t Figure 5: Pima Inians: Evolution of the cumulate weights (top) an of the estimate entropy ivergence E π log(q α,θ (β))] (bottom) for the Rao-Blackwellise version. 18
19 5 Conclusions The propose algorithm provies a flexible an robust framework for aapting general importance sampling ensities represente as mixtures. The extension to mixtures of t istribution broaens the scope of the metho by allowing approximation of heavier tail targets. Moreover, we can exten here the remarks mae in Douc et al. (2007a,b), namely that the upate mechanism provies an early stabilisation of the parameters of the mixture. It is therefore unnecessary to rely on a large value of T: with large enough sample sizes N at each iteration especially on the initial iteration that requires many points to counter-weight a potentially poor initial proposal, it is quite uncommon to fail to spot a stabilisation of both the estimates an of the entropy criterion within a few iterations. While this paper relies on the generic entropy criterion to upate the mixture ensity, we want to stress that it is also possible to use a more focusse eviance criterion, namely the h-entropy with E h (π,q (α,θ) ) = D(π h q (α,θ) ), (14) π h (x) h(x) π(h) π(x), that is tune to the estimation of a particular function h, as it is well-known that the optimal choice of the importance ensity for the self-normalise importance sampling estimator is exactly π h. Since the normalising constant in π h oes not nee to be known, one can erive an aaptive algorithm that resembles the metho presente in this paper. It is expecte that this moification will be helpful in reaching IS ensities that provie a low approximation error for a specific function h, which is also an important feature of importance sampling in several applications. References Cappé, O., Guillin, A., Marin, J., an Robert, C. (2004). Population Monte Carlo. J. Comput. Graph. Statist., 13(4): Cappé, O., Moulines, E., an Ryén, T. (2005). Inference in Hien Markov Moels. Springer-Verlag, New York. Chen, R. an Liu, J. S. (1996). Preictive upating metho an Bayesian classification. J. Royal Statist. Soc. Series B, 58(2): Douc, R., Guillin, A., Marin, J.-M., an Robert, C. (2007a). Convergence of aaptive mixtures of importance sampling schemes. Ann. Statist., 35(1): Douc, R., Guillin, A., Marin, J.-M., an Robert, C. (2007b). Minimum variance importance sampling via population Monte Carlo. ESAIM: Probability an Statistics, 11: Doucet, A., e Freitas, N., an Goron, N. (2001). Sequential Monte Carlo Methos in Practice. Springer-Verlag, New York. Geweke, J. (1989). Bayesian inference in econometric moels using Monte Carlo integration. Econometrica, 57: Hesterberg, T. (1995). Weighte average importance sampling an efensive mixture istributions. Technometrics, 37(2): Oh, M. an Berger, J. (1993). Integration of multimoal functions by Monte Carlo importance sampling. J. American Statist. Assoc., 88:
20 Peel, D. an McLachlan, G. (2000). Robust mixture moelling using the t istribution. Statistics an Computing, 10: R Development Core Team (2006). R: A Language an Environment for Statistical Computing. R Founation for Statistical Computing, Vienna, Austria. Robert, C. an Casella, G. (2004). Monte Carlo Statistical Methos. Springer-Verlag, New York, secon eition. Rubinstein, R. Y. an Kroese, D. P. (2004). The Cross-Entropy Metho. Springer-Verlag, New York. West, M. (1992). Moelling with mixtures. In Berger, J., Bernaro, J., Dawi, A., an Smith, A., eitors, Bayesian Statistics 4, pages Oxfor University Press, Oxfor. 20
Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More informationAdaptive Population Monte Carlo
Adaptive Population Monte Carlo Olivier Cappé Centre Nat. de la Recherche Scientifique & Télécom Paris 46 rue Barrault, 75634 Paris cedex 13, France http://www.tsi.enst.fr/~cappe/ Recent Advances in Monte
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationWEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada
WEIGHTIG A RESAMPLED PARTICLE I SEQUETIAL MOTE CARLO L. Martino, V. Elvira, F. Louzaa Dep. of Signal Theory an Communic., Universia Carlos III e Mari, Leganés (Spain). Institute of Mathematical Sciences
More informationA Review of Multiple Try MCMC algorithms for Signal Processing
A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications
More informationRobust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k
A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationChapter 6: Energy-Momentum Tensors
49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.
More informationThermal conductivity of graded composites: Numerical simulations and an effective medium approximation
JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University
More information'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21
Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting
More informationThe Exact Form and General Integrating Factors
7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily
More informationTime-of-Arrival Estimation in Non-Line-Of-Sight Environments
2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor
More informationGaussian processes with monotonicity information
Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process
More informationConvergence of Random Walks
Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationLower Bounds for the Smoothed Number of Pareto optimal Solutions
Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.
More informationNOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,
NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which
More informationA Modification of the Jarque-Bera Test. for Normality
Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam
More informationInfluence of weight initialization on multilayer perceptron performance
Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -
More informationMath 1B, lecture 8: Integration by parts
Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores
More informationResearch Article When Inflation Causes No Increase in Claim Amounts
Probability an Statistics Volume 2009, Article ID 943926, 10 pages oi:10.1155/2009/943926 Research Article When Inflation Causes No Increase in Claim Amounts Vytaras Brazauskas, 1 Bruce L. Jones, 2 an
More informationTopic 7: Convergence of Random Variables
Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information
More informationd dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1
Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of
More informationSeparation of Variables
Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical
More informationThe total derivative. Chapter Lagrangian and Eulerian approaches
Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationLinear First-Order Equations
5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)
More informationTHE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE
Journal of Soun an Vibration (1996) 191(3), 397 414 THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE E. M. WEINSTEIN Galaxy Scientific Corporation, 2500 English Creek
More informationCONVERGENCE OF ADAPTIVE MIXTURES OF IMPORTANCE SAMPLING SCHEMES 1. I = f(x)π(x)dx
The Annals of Statistics 2007, Vol. 35, o. 1, 420 448 DOI: 10.1214/009053606000001154 Institute of Mathematical Statistics, 2007 COVERGECE OF ADAPTIVE MIXTURES OF IMPORTACE SAMPLIG SCHEMES 1 BY R. DOUC,
More informationThe Principle of Least Action
Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of
More informationProof of SPNs as Mixture of Trees
A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a
More informationHyperbolic Moment Equations Using Quadrature-Based Projection Methods
Hyperbolic Moment Equations Using Quarature-Base Projection Methos J. Koellermeier an M. Torrilhon Department of Mathematics, RWTH Aachen University, Aachen, Germany Abstract. Kinetic equations like the
More informationLATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION
The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische
More informationThis module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics
This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.
More informationFlexible High-Dimensional Classification Machines and Their Asymptotic Properties
Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department
More informationMonte Carlo Methods with Reduced Error
Monte Carlo Methos with Reuce Error As has been shown, the probable error in Monte Carlo algorithms when no information about the smoothness of the function is use is Dξ r N = c N. It is important for
More informationSituation awareness of power system based on static voltage security region
The 6th International Conference on Renewable Power Generation (RPG) 19 20 October 2017 Situation awareness of power system base on static voltage security region Fei Xiao, Zi-Qing Jiang, Qian Ai, Ran
More informationMath Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors
Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+
More informationParameter estimation: A new approach to weighting a priori information
Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a
More informationQuantile function expansion using regularly varying functions
Quantile function expansion using regularly varying functions arxiv:705.09494v [math.st] 9 Aug 07 Thomas Fung a, an Eugene Seneta b a Department of Statistics, Macquarie University, NSW 09, Australia b
More informationEuler equations for multiple integrals
Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................
More informationEntanglement is not very useful for estimating multiple phases
PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The
More informationExpected Value of Partial Perfect Information
Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo
More informationWEIGHTING A RESAMPLED PARTICLES IN SEQUENTIAL MONTE CARLO (EXTENDED PREPRINT) L. Martino, V. Elvira, F. Louzada
WEIGHTIG A RESAMLED ARTICLES I SEQUETIAL MOTE CARLO (ETEDED RERIT) L. Martino, V. Elvira, F. Louzaa Dep. of Signal Theory an Communic., Universia Carlos III e Mari, Leganés (Spain). Institute of Mathematical
More informationSchrödinger s equation.
Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of
More informationIntroduction to the Vlasov-Poisson system
Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its
More informationNon-Linear Bayesian CBRN Source Term Estimation
Non-Linear Bayesian CBRN Source Term Estimation Peter Robins Hazar Assessment, Simulation an Preiction Group Dstl Porton Down, UK. probins@stl.gov.uk Paul Thomas Hazar Assessment, Simulation an Preiction
More informationCalculus of Variations
16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t
More informationensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y
Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay
More information05 The Continuum Limit and the Wave Equation
Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,
More informationarxiv: v4 [math.pr] 27 Jul 2016
The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,
More informationSome vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10
Some vector algebra an the generalize chain rule Ross Bannister Data Assimilation Research Centre University of Reaing UK Last upate 10/06/10 1. Introuction an notation As we shall see in these notes the
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationA note on the Mooney-Rivlin material model
A note on the Mooney-Rivlin material moel I-Shih Liu Instituto e Matemática Universiae Feeral o Rio e Janeiro 2945-97, Rio e Janeiro, Brasil Abstract In finite elasticity, the Mooney-Rivlin material moel
More information1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.
Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency
More informationLecture 2 Lagrangian formulation of classical mechanics Mechanics
Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,
More information. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.
S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial
More informationA new proof of the sharpness of the phase transition for Bernoulli percolation on Z d
A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition
More informationChapter 4. Electrostatics of Macroscopic Media
Chapter 4. Electrostatics of Macroscopic Meia 4.1 Multipole Expansion Approximate potentials at large istances 3 x' x' (x') x x' x x Fig 4.1 We consier the potential in the far-fiel region (see Fig. 4.1
More informationConstruction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems
Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu
More information19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control
19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior
More informationON THE OPTIMALITY SYSTEM FOR A 1 D EULER FLOW PROBLEM
ON THE OPTIMALITY SYSTEM FOR A D EULER FLOW PROBLEM Eugene M. Cliff Matthias Heinkenschloss y Ajit R. Shenoy z Interisciplinary Center for Applie Mathematics Virginia Tech Blacksburg, Virginia 46 Abstract
More informationHybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion
Hybri Fusion for Biometrics: Combining Score-level an Decision-level Fusion Qian Tao Raymon Velhuis Signals an Systems Group, University of Twente Postbus 217, 7500AE Enschee, the Netherlans {q.tao,r.n.j.velhuis}@ewi.utwente.nl
More informationLecture 2: Correlated Topic Model
Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables
More informationCascaded redundancy reduction
Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,
More informationQuantum Mechanics in Three Dimensions
Physics 342 Lecture 20 Quantum Mechanics in Three Dimensions Lecture 20 Physics 342 Quantum Mechanics I Monay, March 24th, 2008 We begin our spherical solutions with the simplest possible case zero potential.
More informationTutorial on Maximum Likelyhood Estimation: Parametric Density Estimation
Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing
More information3.2 Shot peening - modeling 3 PROCEEDINGS
3.2 Shot peening - moeling 3 PROCEEDINGS Computer assiste coverage simulation François-Xavier Abaie a, b a FROHN, Germany, fx.abaie@frohn.com. b PEENING ACCESSORIES, Switzerlan, info@peening.ch Keywors:
More informationMath 342 Partial Differential Equations «Viktor Grigoryan
Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite
More informationTable of Common Derivatives By David Abraham
Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec
More informationThermal runaway during blocking
Thermal runaway uring blocking CES_stable CES ICES_stable ICES k 6.5 ma 13 6. 12 5.5 11 5. 1 4.5 9 4. 8 3.5 7 3. 6 2.5 5 2. 4 1.5 3 1. 2.5 1. 6 12 18 24 3 36 s Thermal runaway uring blocking Application
More informationTractability results for weighted Banach spaces of smooth functions
Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March
More informationLogarithmic spurious regressions
Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate
More informationEstimation of the Maximum Domination Value in Multi-Dimensional Data Sets
Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.
More informationOptimization of Geometries by Energy Minimization
Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.
More informationEnergy behaviour of the Boris method for charged-particle dynamics
Version of 25 April 218 Energy behaviour of the Boris metho for charge-particle ynamics Ernst Hairer 1, Christian Lubich 2 Abstract The Boris algorithm is a wiely use numerical integrator for the motion
More informationFLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction
FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number
More informationSome Examples. Uniform motion. Poisson processes on the real line
Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]
More informationTHE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS
THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS BY ANDREW F. MAGYAR A issertation submitte to the Grauate School New Brunswick Rutgers,
More informationRobustness and Perturbations of Minimal Bases
Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important
More informationBalancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling
Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January
More informationSpurious Significance of Treatment Effects in Overfitted Fixed Effect Models Albrecht Ritschl 1 LSE and CEPR. March 2009
Spurious Significance of reatment Effects in Overfitte Fixe Effect Moels Albrecht Ritschl LSE an CEPR March 2009 Introuction Evaluating subsample means across groups an time perios is common in panel stuies
More informationAn Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an
More informationTransmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency
Transmission Line Matrix (TLM network analogues of reversible trapping processes Part B: scaling an consistency Donar e Cogan * ANC Eucation, 308-310.A. De Mel Mawatha, Colombo 3, Sri Lanka * onarecogan@gmail.com
More informationAPPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France
APPROXIMAE SOLUION FOR RANSIEN HEA RANSFER IN SAIC URBULEN HE II B. Bauouy CEA/Saclay, DSM/DAPNIA/SCM 91191 Gif-sur-Yvette Ceex, France ABSRAC Analytical solution in one imension of the heat iffusion equation
More informationPolynomial Inclusion Functions
Polynomial Inclusion Functions E. e Weert, E. van Kampen, Q. P. Chu, an J. A. Muler Delft University of Technology, Faculty of Aerospace Engineering, Control an Simulation Division E.eWeert@TUDelft.nl
More informationinflow outflow Part I. Regular tasks for MAE598/494 Task 1
MAE 494/598, Fall 2016 Project #1 (Regular tasks = 20 points) Har copy of report is ue at the start of class on the ue ate. The rules on collaboration will be release separately. Please always follow the
More informationOne-dimensional I test and direction vector I test with array references by induction variable
Int. J. High Performance Computing an Networking, Vol. 3, No. 4, 2005 219 One-imensional I test an irection vector I test with array references by inuction variable Minyi Guo School of Computer Science
More informationComparative Approaches of Calculation of the Back Water Curves in a Trapezoidal Channel with Weak Slope
Proceeings of the Worl Congress on Engineering Vol WCE, July 6-8,, Lonon, U.K. Comparative Approaches of Calculation of the Back Water Curves in a Trapezoial Channel with Weak Slope Fourar Ali, Chiremsel
More informationEVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES
MATHEMATICS OF COMPUTATION Volume 69, Number 231, Pages 1117 1130 S 0025-5718(00)01120-0 Article electronically publishe on February 17, 2000 EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION
More informationA simple model for the small-strain behaviour of soils
A simple moel for the small-strain behaviour of soils José Jorge Naer Department of Structural an Geotechnical ngineering, Polytechnic School, University of São Paulo 05508-900, São Paulo, Brazil, e-mail:
More informationThe Press-Schechter mass function
The Press-Schechter mass function To state the obvious: It is important to relate our theories to what we can observe. We have looke at linear perturbation theory, an we have consiere a simple moel for
More informationAnalytic Scaling Formulas for Crossed Laser Acceleration in Vacuum
October 6, 4 ARDB Note Analytic Scaling Formulas for Crosse Laser Acceleration in Vacuum Robert J. Noble Stanfor Linear Accelerator Center, Stanfor University 575 San Hill Roa, Menlo Park, California 945
More informationLDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling
Case Stuy 5: Mixe Membership Moeling LDA Collapse Gibbs Sampler, VariaNonal Inference Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 Task : Mixe
More informationA Weak First Digit Law for a Class of Sequences
International Mathematical Forum, Vol. 11, 2016, no. 15, 67-702 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1288/imf.2016.6562 A Weak First Digit Law for a Class of Sequences M. A. Nyblom School of
More informationarxiv:hep-th/ v1 3 Feb 1993
NBI-HE-9-89 PAR LPTHE 9-49 FTUAM 9-44 November 99 Matrix moel calculations beyon the spherical limit arxiv:hep-th/93004v 3 Feb 993 J. Ambjørn The Niels Bohr Institute Blegamsvej 7, DK-00 Copenhagen Ø,
More informationJUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson
JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises
More informationAll s Well That Ends Well: Supplementary Proofs
All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee
More informationOptimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations
Optimize Schwarz Methos with the Yin-Yang Gri for Shallow Water Equations Abessama Qaouri Recherche en prévision numérique, Atmospheric Science an Technology Directorate, Environment Canaa, Dorval, Québec,
More informationDelocalization of boundary states in disordered topological insulators
Journal of Physics A: Mathematical an Theoretical J. Phys. A: Math. Theor. 48 (05) FT0 (pp) oi:0.088/75-83/48//ft0 Fast Track Communication Delocalization of bounary states in isorere topological insulators
More information