CONVERGENCE OF ADAPTIVE MIXTURES OF IMPORTANCE SAMPLING SCHEMES 1. I = f(x)π(x)dx

Size: px
Start display at page:

Download "CONVERGENCE OF ADAPTIVE MIXTURES OF IMPORTANCE SAMPLING SCHEMES 1. I = f(x)π(x)dx"

Transcription

1 The Annals of Statistics 2007, Vol. 35, o. 1, DOI: / Institute of Mathematical Statistics, 2007 COVERGECE OF ADAPTIVE MIXTURES OF IMPORTACE SAMPLIG SCHEMES 1 BY R. DOUC, A.GUILLI, J.-M. MARI AD C. P. ROBERT École Polytechnique, École Centrale Marseille an LATP, CRS, IRIA Futurs, Projet Select, Université Orsay an Université Paris Dauphine an CREST-ISEE In the esign of efficient simulation algorithms, one is often beset with a poor choice of proposal istributions. Although the performance of a given simulation kernel can clarify a posteriori how aequate this kernel is for the problem at han, a permanent on-line moification of kernels causes concerns about the valiity of the resulting algorithm. While the issue is most often intractable for MCMC algorithms, the equivalent version for importance sampling algorithms can be valiate quite precisely. We erive sufficient convergence conitions for aaptive mixtures of population Monte Carlo algorithms an show that Rao Blackwellize versions asymptotically achieve an optimum in terms of a Kullback ivergence criterion, while more ruimentary versions o not benefit from repeate upating. 1. Introuction Monte Carlo calibration. In the simulation settings foun in optimization an Bayesian integration, it is well ocumente [20] that the choice of the instrumental istributions is paramount for the efficiency of the resulting algorithms. Inee, in an importance sampling algorithm with importance function gx, we are relying on a istribution g that is customarily ifficult to calibrate outsie a limite range of well-known cases. For instance, a stanar result is that the optimal importance ensity for approximating an integral I = fxπxx is g x fx πx [20], Theorem 3.12, but this formal result is not very informative about the practical choice of g, while a poor choice of g may result in an infinite variance estimator. MCMC algorithms somehow attenuate this ifficulty by using local proposals like ranom walk kernels, but two rawbacks of these Receive January 2005; revise December Supporte in part by an ACI ouvelles Interfaces es Mathématiques grant from the Ministère e la Recherche. AMS 2000 subject classifications. 60F05, 62L12, 65-04, 65C05, 65C40, 65C60. Key wors an phrases. Bayesian statistics, Kullback ivergence, LL, MCMC algorithm, population Monte Carlo, proposal istribution, Rao Blackwellization. 420

2 ADAPTIVE MIXTURES FOR IMPORTACE SAMPLIG 421 proposals are that they may take a long time to converge [18] an their efficiency ultimately epens on the scale of the local exploration. The goals of Monte Carlo experiments are multifacete an therefore the efficiency of an algorithm can be evaluate from many ifferent perspectives. In particular, in Bayesian statistics the Monte Carlo sample can be use to approximate a variety of posterior quantities. onetheless, if we try to assess the generic efficiency of an algorithm an thus evelop a portmanteau evice, a natural approach is to use a measure of agreement between the target an the proposal istribution, similar to the intrinsic loss function propose in [19] for an invariant estimation of parameters. A robust measure of similarity ubiquitous in statistical approximation theory [7] is the Kullback ivergence Eπ, π = log πx πx πx, an this paper aims to minimize Eπ, π within a class of proposals π Aaptivity in Monte Carlo settings. Given the complexity of the original optimization or integration problem which oes itself require Monte Carlo approximations, it is rarely the case that the optimization of the proposal istribution against an efficiency measure can be achieve in close form. Even the computation of the efficiency measure for a given proposal is impossible in the majority of cases. For this reason, a number of aaptive schemes have appeare in the recent literature [20], Section in orer to esign better proposals against a given measure of efficiency without resorting to a stanar optimization algorithm. For instance, in the MCMC community, sequential changes in the variance of Markov kernels have been propose in [13, 14], while aaptive changes taking avantage of regeneration properties of the kernels have been constructe by Gilks, Roberts an Sahu [12] an Sahu an Zhigljavsky [23, 24]. From a more general perspective, Anrieu an Robert [2] have evelope a two-level stochastic optimization scheme to upate parameters of a proposal towars a given integrate efficiency criterion such as the acceptance rate or its ifference with a value known to be optimal see Roberts, Gelman an Gilks [21]. However, as reflecte in this general technical report of Anrieu an Robert [2], the complexity of evising vali aaptive MCMC schemes is a genuine rawback in their extension, given that the constraints on the inhomogeneous Markov chain that results from this aaptive construction are either ifficult to satisfy or result in a fixe proposal after a certain number of iterations. Cappé, Guillin, Marin an Robert [3] see also[20], Chapter 14 evelope a methoology calle Population Monte Carlo PMC [16] motivate by the observation that the importance sampling perspective is much more amenable to aaptivity than MCMC, ue to its unbiase nature: using sampling importance resampling, any given sample from an importance istribution g can be transforme into a sample of points marginally istribute from the target istribution π an Cappé

3 422 R. DOUC, A. GUILLI, J.-M. MARI AD C. P. ROBERT et al. [3] seealso[8] showe that this property is also preserve by repeate an aaptive sampling. In this setting, aaptive is to be unerstoo as a moification of the importance istribution base on the results of previous iterations. The asymptotics of aaptive importance sampling are therefore much more manageable than those of aaptive MCMC algorithms, at least at a primary level, if only because the algorithm can be stoppe at any time. Inee, since every iteration is a vali importance sampling scheme, the algorithm oes not require a burn-in time. Borrowing from the sequential sampling literature [11], the methoology of Cappé et al. [3] thus aime at proucing an aaptive importance sampling scheme via a learning mechanism on a population of points, themselves marginally istribute from the target istribution. However, as shown by the current paper, the original implementation of Cappé et al. [3] may suffer from an asymptotic lack of aaptivity that can be overcome by Rao Blackwellization Plan an objectives. This paper focuses on a specific family of importance functions that are represente as mixtures of an arbitrary number of fixe proposals. These proposals can be eucate guesses of the target istribution, ranom walk proposals for local exploration of the target, nonparametric kernel approximations to the target or any combination of these. Using these fixe proposals as a basis, we then evise an upating mechanism for the weights of the mixture an prove that this mechanism converges to the optimal mixture, the optimality being efine here in terms of Kullback ivergence. From a probabilistic point of view, the techniques use in this paper are relate to techniques an results foun in [4, 6, 17]. In particular, the triangular array technique that is central to the CLT proofs below can be foun in [4, 10]. The paper is organize as follows. We first present the algorithmic an mathematical etails in Section 2 an establish a generic central limit theorem. We evaluate the convergence properties of the basic version of PMC in Section 3, exhibiting its limitations, an show in Section 5 that its Rao Blackwellize version overcomes these limitations an achieves optimality for the Kullback criterion evelope in Section 4. Section 6 illustrates the practical convergence of the metho on a few benchmark examples. 2. Population Monte Carlo. The Population Monte Carlo PMC algorithm introuce in [3] is a form of iterate sampling importance resampling SIR. The appeal of using a repeate form of SIR is that previous samples are informative about the connections between the proposal importance an the target istributions. We stress from the outset that this scheme has very few connections with MCMC algorithms since a PMC is not Markovian, being base on the whole sequence of simulations an b PMC can be stoppe at any time, being valiate by the basic importance sampling ientity [20], equation 3.9 rather than by a probabilistic convergence result like the ergoic theorem. These features motivate the use of the metho in setups where off-the-shelf MCMC algorithms cannot be

4 ADAPTIVE MIXTURES FOR IMPORTACE SAMPLIG 423 of use. We first recall basic Monte Carlo principles, mostly to efine notation an to make precise our goals The Monte Carlo framework. On a measurable space, A, weare given a target, that is, a probability istribution π on, A. We assume that π is ominate by a reference measure µ, π µ, an also enote by πx = πxµx its ensity. In most settings, incluing Bayesian statistics, the ensity π is known up to a normalizing constant, πx πx. The purpose of running a simulation experiment with the target π is to approximate quantities relate to π, such as intractable integrals πf = fxπx, but we o not focus here on a specific quantity πf. In this setting, a stanar stochastic approximation metho is the Monte Carlo metho, base on an i.i.. sample x 1,...,x simulate from π, that approximates πf by ˆπ MC f = 1 fx i, which almost surely converges to πf as goes to infinity by the law of large numbers LL. The central limit theorem CLT implies, in aition, that if πf 2 = f 2 xπx <,then {ˆπ MC f πf} L 0, V π f, where V π f = π[f πf] 2. Obviously, this approach requires a irect i.i.. simulation from π or π, which often is impossible. An alternative [20], Chapter 3 is to use importance sampling, that is, to choose a probability istribution ν µ on, A calle the proposal or importance istribution, the ensity of which is also enote by ν, an to estimate πf by ˆπ ν, IS f = 1 π fx i x i. ν If π is also ominate by ν, π ν,then ˆπ ν, IS f almost surely converges to πf an if νf 2 π/ν 2 <, then the CLT also applies, that is, {ˆπ IS ν, f πf} L 0, V ν f π. ν As the normalizing constant of the target istribution π is unknown, it is not possible to irectly use the IS estimator ˆπ ν, IS f. A convenient substitute is the selfnormalize IS estimator, 1 ˆπ SIS ν, f = π/νx i fx i π/νx i,

5 424 R. DOUC, A. GUILLI, J.-M. MARI AD C. P. ROBERT which also converges almost surely to πf.ifν1 + f 2 π/ν 2 <, then the CLT applies: { {ˆπ SIS ν, f πf} L 0, V ν [f πf] π }. ν Obviously, the quality of the IS an SIS approximations strongly epens on the choice of the proposal istribution ν, which is elicate for complex istributions like those that occur in high-imensional problems. While multiplication of the number of proposals may offer some reprieve in this regar, we must stress at this point that our PMC methoology also suffers from the curse of imensionality to which all importance sampling methos are subject in the sense that highimensional problems require a consierable increase in computational power Sampling importance resampling. The sampling importance resampling SIR metho of Rubin [22] is an extension of the IS metho that achieves simulation from π by aing resampling to simple reweighting. More precisely, the SIR algorithm operates in two stages. The first stage is ientical to IS an consists in generating an i.i.. sample x 1,...,x from ν. The secon stage buils a sample from π, x 1,..., x M, base on the instrumental sample x 1,...,x, by resampling. While there are many resampling methos [20], Section , the most stanar if least efficient approach is multinomial sampling from {x 1,...,x } with probabilities proportional to the importance weights [ π ν x 1,..., π ν x ], that is, the replacement of the weighte sample x 1,...,x by an unweighte sample x 1,..., x M,where x i = x Ji 1 i M an where J 1,...,J M MM, ϱ 1,...,ϱ, the multinomial istribution with probabilities P[J l = i x 1,...,x ]=ϱ i π ν x i, 1 i,1 l M. The SIR estimator of πf is then the stanar average ˆπ SIR ν,,m f = M M 1 f x i, which also converges to πf since each x i is marginally istribute from π. By construction, the variance of the SIR estimator is greater than the variance of the SIS estimator. Inee, the expectation of ˆπ ν,,m SIR f conitional on the sample x 1,...,x is equal to ˆπ ν, SIS f. ote, however, that an asymptotic analysis of ˆπ ν,,m SIR f is quite elicate because of the epenencies in the SIR sample which, again, is not an i.i.. sample from π The population Monte Carlo algorithm. In their generalization of importance sampling, Cappé et al. [3] introuce an iterative imension in the prouction of importance samples, aime at aapting the importance istribution ν to the target istribution π. Iterations are then use to learn about π from the poor or

6 ADAPTIVE MIXTURES FOR IMPORTACE SAMPLIG 425 goo performance of earlier proposals an that performance can be evaluate using ifferent criteria, as, for example, the entropy of the istribution of importance weights. More precisely, at iteration tt= 0, 1,...,T of the PMC algorithm, a sample of values from the target istribution π is prouce by a SIR algorithm whose importance function ν t epens on t, in the sense that ν t can be erive from the t 1 previous realizations of the algorithm, except for the first iteration t = 0, where the proposal istribution ν 0 is chosen as in a regular importance sampling experiment. A generic renering of the algorithm is thus as follows: PMC ALGORITHM. At time t = 0, 1,...,T, 1. generate x i,t 1 i by i.i.. sampling from ν t an compute the normalize importance weights ω i,t ; 2. resample x i,t 1 i from x i,t 1 i by multinomial sampling with weights ω i,t ; 3. construct ν t+1 base on x i,τ, ω i,τ 1 i,0 τ t. At this stage of the escription of the PMC algorithm, we o not give in etail the construction of ν t, which can thus arbitrarily epen on the past simulations. Section 3 stuies a particular choice of the ν t s in etail. A major fining of Cappé et al. [3] is, however, that the epenence of ν t on earlier proposals an realizations oes not jeoparize the funamental importance sampling ientity. Local aaptive importance sampling schemes can thus be chosen in much wier generality than was previously thought an this shows that, thanks to the introuction of a temporal imension in the selection of the importance function, an aaptive perspective can be aopte at little cost, with a potentially large gain in efficiency. Obviously, when the construction of the proposal istribution ν t is completely open, there is no guarantee of permanent improvement of the simulation scheme, whatever the criterion aopte to measure this improvement. For instance, as illustrate by Cappé et al. [3], a constant epenence of ν t on the past sample quickly leas to stable behavior. A more extreme illustration is the case where the sequence ν t egenerates into a quasi-dirac mass at a value base on earlier simulations an where the performance of the resulting PMC algorithm worsens. In orer to stuy the positive an negative effects of the upate of the importance function ν t,we hereafter consier a special class of proposals base on mixtures an stuy two particular upating schemes, one in which improvement oes not occur an one in which it oes. 3. The D-kernel PMC algorithm. In this section an the following ones, we stuy aaptivity for a particular type of parameterize PMC scheme, in the case where ν t is a mixture of measures ζ 1 D that are chosen prior to

7 426 R. DOUC, A. GUILLI, J.-M. MARI AD C. P. ROBERT the simulation experiment, base either on an eucate guess about π or on local approximations as in nonparametric kernel estimation. The epenence of the ν t s on the past simulations is of the form D ν t x = α t ζ {x i,t 1, ω i,t 1 } 1 i,x =1 D = α t ω j,t 1 Q x j,t 1,x, =1 j=1 where the ω i,t s enote the importance weights an the transition kernels Q 1 D are given. This situation is rather common in MCMC settings where several competing transition kernels are simultaneously available, but ifficult to compare. For instance, the cycle an mixture MCMC schemes iscusse by Tierney [25] are of this nature. Hereafter, we thus focus on aapting the weights a t 1 D towar a better fit with the target istribution. A natural approach to upating the weights a t 1 D is to favor those kernels Q that lea to a high acceptance probability in the resampling step of the PMC algorithm. We thus choose the a t s to be proportional to the survival rates of the corresponing Q s in the previous resampling step [since using the ouble mixture of the measures Q x j,t 1,x means that we first select a point x j,t 1 in the previous sample with probability ω i,t 1, then select a component with probability α t an, finally, simulate from Q x j,t 1,x] Algorithmic etails. The family Q 1 D of transition kernels on A is such that Q x, 1 D, x is ominate by the reference measure µ introuce earlier. As above, we set the corresponing target ensity an the transition kernel to be π an q, 1 D, respectively. The associate PMC algorithm then upates the proposal weights an generates the corresponing samples from π as follows: D-KEREL PMC ALGORITHM. At time 0, a generate x i,0 i.i.. ν 0 1 i an compute the normalize importance weights ω i,0 {π/ν 0 }x i,0 ; b resample x i,0 1 i into x i,0 1 i by multinomial sampling J i,0 1 i M, ω i,0 1 i, that is, x i,0 = x Ji,0,0; c set α 1, = 1/D 1 D. At time t = 1,...,T, a select the mixture components K i,t 1 i M, α t, 1 D ;

8 ADAPTIVE MIXTURES FOR IMPORTACE SAMPLIG 427 b generate inepenent x i,t q Ki,t x i,t 1,x 1 i an compute the normalize importance weights ω i,t πx i,t /q Ki,t x i,t 1,x i,t ; c resample x i,t 1 i into x i,t 1 i by multinomial sampling with weights ω i,t 1 i ; set α t+1, = ω i,t I K i,t 1 D. In this implementation, at time t 1, step a chooses a kernel inex in the mixture for each point of the sample, while step upates the weight α as the relative importance of kernel Q in the current roun or, in other wors, as the relative survival rate of the points simulate from kernel Q. Inee, since the survival of a simulate value x i,t is riven by its importance weight ω i,t, reweighting is relate to the respective magnitues of the importance weights for the ifferent kernels. Also, note that the resampling step c is use to avoi the propagation of very small importance weights along iterations an the subsequent egeneracy phenomenon that plagues iterate IS schemes like particle filters [11]. At time t = T, the algorithm shoul thus stop at step b for integral approximations to be base on the weighte sample x i,t, ω i,t 1 i Convergence properties. In orer to assess the impact of this upate mechanism on the performance of the PMC scheme, we now consier the convergence properties of the above algorithm when the sample size grows to infinity. Inee, as alreay pointe out in Cappé et al. [3], it oes not make sense to consier the alternative asymptotics of the PMC scheme namely, when T grows to infinity, given that this algorithm is intene to be run with a small number T of iterations. A basic assumption on the kernels Q is that the corresponing importance weights are almost surely finite, that is, A1 {1,...,D} π π{q x, x = 0}=0, where ξ ζ enotes the prouct measure, that is, ξ ζa B = A B ξx ζy. We enote by γ u the uniform istribution on {1,...,D}, thatis,γ u k = 1/D for all k {1,...,D}. The following result whose proof is given in Appenix B is a general LL on the pairs x i,t,k i,t prouce by the above algorithm. PROPOSITIO 3.1. every t, Uner A1, for any π γ u -measurable function h an ω i,t hx i,t,k i,t P π γ u h. ote that this convergence result is more than we nee for Monte Carlo purposes since the K i,t s are auxiliary variables that are not relevant to the original

9 428 R. DOUC, A. GUILLI, J.-M. MARI AD C. P. ROBERT problem. However, the consequences of this general result are far from negligible. First, it implies that the approximation provie by the algorithm is convergent, in the sense that ω i,t fx i,t is a convergent estimator of πf. But, more importantly, it also shows that for t 1, α t, = ω i,t I K i,t P 1/D. Therefore, at each iteration, the weights of all kernels converge to 1/D when the sample size grows to infinity. This translates into a lack of learning properties for the D-kernel PMC algorithm: its properties at iteration 1 an at iteration 10 are the same. In other wors, this algorithm is not aaptive an only requires one iteration for a large value of. We can also relate this to the fast stabilization of the approximation in [3]. ote that a CLT can be establishe for this algorithm, but given its unappealing features, we leave the exercise for the intereste reaer. 4. The Kullback ivergence. In orer to obtain a correct an aaptive version of the D-kernel algorithm, we must first choose an effective criterion to evaluate both the aaptivity an the approximation of the target istribution by the proposal istribution. We then propose a moification of the original D-kernel algorithm that achieves efficiency in this sense. As argue in many papers, using a wie range of arguments, a natural choice of approximation metric is the Kullback ivergence. We thus aim to erive the D- kernel mixture that minimizes the Kullback ivergence between this mixture an the target measure π, 1 log πxπx πx D =1 α q x, x π πx, x. This section is evote to the problem of fining an iterative choice of mixing coefficients α that converge to this minimum. A etaile escription of the optimal PMC scheme is given in Section The criterion. Using the same notation as above, in conjunction with the choice of weights α in the D-kernel mixture, we introuce the simplex of R D, { } D S = α = α 1,...,α D ; {1,...,D},α 0an α = 1, an enote π = π π. We now assume that the D kernels also satisfy the conition A2 =1 {1,...,D} E π [ log q X, X ] = log q x, x πx,x <,

10 ADAPTIVE MIXTURES FOR IMPORTACE SAMPLIG 429 which is automatically satisfie when all ratios π/q are boune. From the Kullback ivergence, we erive a function on S such that for α S, [ ] D D E π α = log α q x, x πx,x = E π log α q X, X. =1 By virtue of Jensen s inequality, E π α πxlog πx for all α S. ote that ue to the strict concavity of the log function, E π is a strictly concave function on a connecte compact set an thus has no local maximum besies the global maximum, enote α max.since log πxπx E π α = E π / log πxπx πx =1 { D α q X, X =1 }, α max is the optimal vector of weights for a mixture of transition kernels such that the prouct of π by this mixture is the closest to the prouct istribution π A maximization algorithm. We now stuy an iterative proceure, akin to the EM algorithm, that upates the weights so that the function E π α increases at each step. Defining the function F on S as [ ] E π Fα= / α q X, X D α j q j X, X j=1, 1 D we construct the sequence α t t 1 on S such that α 1 = 1/D,...,1/D an α t+1 = Fα t for t 1. ote that uner assumption A1,forallt 0, / D E π q X, X αj t q j X, X > 0 j=1 an thus for all t 0an1 D, wehaveα t > 0. If we efine the extremal set { I D = α S ; {1,...,D}, either α = 0or we then have the following fixe-point result: PROPOSITIO 4.1. q X, X } E π Dj=1 = 1, α j q j X, X Uner A1 an A2, i E π F E π is continuous; ii for all α S, E π Fα E π α; iii I D ={α S ; Fα= α}={α S ; E π Fα= E π α} an I D is finite.

11 430 R. DOUC, A. GUILLI, J.-M. MARI AD C. P. ROBERT PROOF. E π is clearly continuous. Moreover, by Lebesgue s ominate convergence theorem, the function α E π α q X, X / j=1 α j q j X, X is also continuous, which implies that F is continuous. This completes the proof of i. Due to the concavity of the log function, E π F α E π α [ D α q X, X = E π log Dj=1 =1 α j q j X, X E q X, X ] π Dj=1 α j q j X, X [ D α q X, X E π Dj=1 =1 α j q j X, X log E q X, X ] π Dj=1 α j q j X, X D q X, X q X, X = α E π Dj=1 log E =1 α j q j X, X π Dj=1. α j q j X, X Applying the inequality u log u u 1 yiels ii. Moreover, the equality in u log u u 1 hols if an only if u = 1. Therefore, equality above is equivalent to / D α 0 E π q X, X α j q j X, X = 1. j=1 Thus, I D ={α S ; E π Fα= E π α}. The secon equality, I D ={α S ; Fα= α}, is straightforwar. We now prove by recursion on D that I D is finite, which is equivalent to proving that { α ID ; α 0 {1,...,D} } is empty or finite. If this set is nonempty, then any element α in this set satisfies q X, X {1,...,D} E π Dj=1 = 1, α j q j X, X which implies that 0 = D =1 α max q X, X E π Dj=1 α j q j X, X D=1 α max q X, X = E π Dj=1 1 α j q j X, X D=1 α E π log max q X, X Dj=1 α j q j X, X 0. 1

12 ADAPTIVE MIXTURES FOR IMPORTACE SAMPLIG 431 Since the global maximum of E π is unique, we conclue that α = α max an hence iii follows Average EM. Proposition 4.1 implies that our recursive proceure satisfies E π α t+1 E π α t. Therefore, the Kullback ivergence criterion 1 ecreases at each step. This property is closely linke to the EM algorithm [20], Section 5.3. More precisely, consier the mixture moel V M1, {α 1,...,α D } an W = X, X V πxq V x, x with parameter α. We enote by Ē α the corresponing expectation, by p α v, w the joint ensity of V, W with respect to µ µ an by p α w the ensity of W with respect to µ. It is then easy to check that E π α = logp α w πw,which is an average version of the criterion to be maximize in the EM algorithm when only W is observe. In this case, a natural iea, aapte from the EM algorithm, is to upate α accoring to the iterative scheme α t+1 = arg max Ē α t [log p α V, w w] πw. α S Straightforwar algebra can be use to show that this efinition of α t+1 is equivalent to the upate formula α t+1 = Fα t that we use above. Our algorithm then appears as an average EM, but shares with EM the property that the criterion increases eterministically at each step. The following result ensures that any α ifferent from α max is repulsive. PROPOSITIO 4.2. Uner A1 an A2, for every α α max S, there exists a neighborhoo V α of α such that if α t 0 V α, then α t t t0 leaves V α within a finite time. PROOF. Let α α max. Then using the inequality u 1 log u,wehave 1 D α max q X, X E π Dj=1 =1 α j q j X, X D=1 α E π log max q X, X Dj=1 α j q j X, X > 0, which implies that there exists 1 D such that / D E π q X, X α j q j X, X > 1. j=1

13 432 R. DOUC, A. GUILLI, J.-M. MARI AD C. P. ROBERT Using a nonincreasing sequence W n n 0 of neighborhoos of α in S, the monotone convergence theorem implies that q X, X 1 < E π Dj=1 = E α j q j X, X π lim inf q X, X n β W n Dj=1 β j q j X, X lim inf q X, X E π n β W n Dj=1. β j q j X, X Thus, there exist W n0 = V α, a neighborhoo of α an η>1 such that / D β V α E π q X, X 2 β j q j X, X >η. j=1 ow, for all t 0an1 D, wehave1 α t > 0. Combining 2 with the upate formulas for α t = Fα t 1 shows that α t t 0 leaves V α within a finite time. We thus conclue that the maximization algorithm is convergent, as asserte by the following proposition: PROPOSITIO 4.3. Uner A1 an A2, lim t αt = α max. PROOF. First, recall that I D is a finite set an that α max I D,thatis, I D ={β 0,β 1,...,β I } with β 0 = α max. We introuce a sequence W i 0 i I of isjoint neighborhoos of the β i s such that for all 0 i I, FW i is isjoint from j i W j [this is possible since Fβ i = β i an F is continuous] an for all i {1,...,I}, W i V βi,wherethev βi s are efine as in the proof of Proposition 4.2. The sequence E π α t t 0 is upper-boune an nonecreasing; therefore it converges. This implies that lim t E π Fα t E π α t = 0. By continuity of E π F E π, there exists T>0such that for all t T, α t j W j.sincefw i is isjoint from j i W j, this implies that there exists i {0,...,I} such that for all t T, α t W i. By Proposition 4.2, i cannot be in {1,...,I}. Thus, for all t T, α t W 0, which is a neighborhoo of β 0 = α max. 5. The Rao Blackwellize D-kernel PMC. Upate of the weights α through the transform F thus improves the Kullback ivergence criterion. We now iscuss how to implement this mechanism within a PMC algorithm that resembles the previous D-kernel algorithm. The only ifference with the algorithm of Section 3.1 is that we make use of the kernel structure in the computation of the importance weight. In MCMC terminology, this is calle Rao Blackwellization

14 ADAPTIVE MIXTURES FOR IMPORTACE SAMPLIG 433 [20], Section 4.2 an it is known to provie variance reuction in ata augmentation settings [20], Section 9.2. In the current context, the improvement brought about by Rao Blackwellization is ramatic, in that the moifie algorithm oes converge to the proposal mixture that is closest to the target istribution in the sense of the Kullback ivergence. More precisely, a Monte Carlo version of the upate via F can be implemente in the iterative efinition of the mixture weights, in the same way that MCEM approximates EM [20], Section The algorithm. In importance sampling, as well as in MCMC settings, the econitioning improvement brought about by Rao Blackwellization may be significant [5]. In the case of the D-kernel PMC scheme, the Rao Blackwellization argument is that it is not necessary to conition on the value of the mixture component in the computation of the importance weight an that the improvement is brought about by using the whole mixture. The importance weight shoul therefore be / D πx i,t α t, q x i,t 1,x i,t rather than πx i,t / q Ki,t x i,t 1,x i,t, =1 as in the algorithm of Section 3.1. As alreay note by Hesterberg [15], the use of the whole mixture in the importance weight is a robust tool that prevents infinite variance importance sampling estimators. In the next section, we show that this choice of weight guarantees that the following moification of the D-kernel algorithm converges to the optimal mixture in terms of Kullback ivergence. RAO BLACKWELLIZED D-KEREL PMC ALGORITHM. At time 0, use the same steps as in the D-kernel PMC algorithm to obtain x i,0 1 i an set α 1, = 1/D 1 D. At time t = 1,...,T, a generate K i,t 1 i M, α t, 1 D ; b generate inepenent x i,t q Ki,t x i,t 1,x 1 i an compute the normalize importance weights ω i,t πx i,t / D =1 α t, q x i,t 1,x i,t ; c resample x i,t into x i,t 1 i by multinomial sampling with weights ω i,t 1 i ; set α t+1, = ω i,t I K i,t 1 D The corresponing law of large numbers. ot very surprisingly, the sample obtaine at each iteration of the above Rao Blackwellize algorithm approximates the target istribution in the sense of the weak law of large numbers LL. ote that the convergence hols uner the very weak assumption A1 an for any test function h that is absolutely integrable with respect to the target istribution π. The function h may thus be unboune.

15 434 R. DOUC, A. GUILLI, J.-M. MARI AD C. P. ROBERT THEOREM 5.1. Uner A1, for any function h in L 1 π an for all t 0, ω i,t hx i,t P πh an 1 h x i,t P πh. PROOF. First, convergence of the secon average follows from the convergence of the first average by Theorem A.1, since the latter is simply a multinomial sampling perturbation of the former. We thus focus on the weighte average an procee by inuction on t. The case t = 0 is the basic importance sampling convergence result. For t 1, if the convergence hols for t 1, then to prove convergence at iteration t, we nee only check, as in Proposition 3.1, that 1 ω i,t hx i,t P πh, where ω i,t enotes the importance weight πx i,t / D =1 α t, q x i,t 1,x i,t. The special case h 1 ensures that the renormalizing sum converges to 1. We apply Theorem A.1 with G = σ x i,t 1 1 i,α t, 1 D an U,i = 1 ω i,t hx i,t. Then conitionally on G,thex i,t s 1 i are inepenent an D x i,t G α t, Q x i,t 1,. =1 oting that ωi,t hx i,t E G πx i,t hx i,t = E D =1 α t, q x i,t 1,x i,t G = πh, we only nee to check conition iii. The en of the proof is then quite similar to the proof of Proposition Convergence of the weights. The next proposition ensures that at each iteration, the upate of the mixture weights in the Rao Blackwellize algorithm approximates the theoretical upate obtaine in Section 4.2 for minimizing the Kullback ivergence criterion. 3 PROPOSITIO 5.1. Uner A1, for all t 1, where α t = Fα t 1. 1 D α t, P α t,

16 ADAPTIVE MIXTURES FOR IMPORTACE SAMPLIG 435 Combining Proposition 5.1 with Proposition 4.3, we obtain that, uner assumptions A1 A2, the Rao Blackwellize version of the PMC algorithm aapts the weights of the propose mixture of kernels, in the sense that it converges to the optimal combination of mixtures with respect to the Kullback ivergence criterion obtaine in Section 4.2. PROOF OF PROPOSITIO 5.1. The case t = 1 is obvious. ow, assume 3 hols for some t 1. As in the proof of Proposition 3.1, we now establish that 1 ω i,t I K i,t = 1 πx i,t Dl=1 α t, l q l x i,t 1,x i,t I K i,t P α t+1, the convergence of the renormalizing sum to 1 being a consequence of this convergence. We apply Theorem A.1 with G = σ x i,t 1 1 i,α t, 1 D an U,i = 1 ω i,t I K i,t. Conitionally on G,theK i,t,x i,t s 1 i are inepenent an for all, A in {1,...,D} A,wehave PK i,t =,x i,t A G = α t, Q x i,t 1,A. To apply Theorem A.1, we nee only check conition iii. We have, for C>0, ω i,t I K i,t E I {ωi,t I K i,t >C} G D j=1 D j=1 P 1 1 D j=1 α t, Dl=1 α t, l { q x i,t 1,x q l x i,t 1,x I { } πx I D 1 q j x i,t 1,x >C πx πx π D 1 q j x,x >C, } πx D 1 q j x i,t 1,x >C πx by the LL of Theorem 5.1. The right-han sie converges to 0 as C tens to infinity since by assumption A1, π{q j x, x = 0} = 0. Thus, Theorem A.1 applies an 1 1 ω i,t I K i,t E ω i,t I K i,t G P 0.

17 436 R. DOUC, A. GUILLI, J.-M. MARI AD C. P. ROBERT To complete the proof, it simply remains to show that 1 E ω i,t I K i,t G 4 = 1 α t, q x i,t 1,x Dl=1 α t, πx P α t+1. l q l x i,t 1,x It follows from the LL state in Theorem 5.1 that 1 α t πx q x i,t 1,x α t Dl=1 αl tq P E q X, X 5 π l x i,t 1,x Dl=1 αl tq = α t+1 lx, X an it thus suffices to check that the ifference between 4 an5 convergesto0 in probability. To show this, first note that for all t 1anall1 D, α t > 0 an thus, by the inuction assumption, for all 1 D, α t, α t /αt P 0. Using the inequality B A D C A D B B D + A C C D C, we have, by straightforwar algebra, that α t, q x i,t 1,x Dl=1 α t, l q l x i,t 1,x αt q x i,t 1,x Dl=1 αl tq l x i,t 1,x α t, q x i,t 1,x α t, Dj=1 α t, sup l α t l l q j x i,t 1,x l {1,...,D} αl t α t, + α t α t q x i,t 1,x Dl=1 αl tq l x i,t 1,x 2 α t sup l {1,...,D} α t, l αl t. The proposition then follows from the convergence α t, α t l α t /αt P A corresponing central limit theorem. We now establish a CLT for the weighte an the unweighte samples when the sample size goes to infinity. As note in Section 2.2 for the SIR algorithm, the asymptotic variance associate with the unweighte sample is larger than the variance of the weighte sample because of the aitional multinomial step. THEOREM 5.2. Uner A1, i for a function h such that π{h 2 x πx/q x, x } < for at least one 1 D, we have 6 L ω i,t {hx i,t πh} 0,σt 2,

18 ADAPTIVE MIXTURES FOR IMPORTACE SAMPLIG 437 where σ 2 t = π{hx πh} 2 πx / D =1 α t q x, x ; ii if, moreover, πh 2 <, then 7 1 L {h x i,t πh} 0,σt 2 + V π h. ote that amongst the conitions uner which this theorem applies, the integrability conition π h 2 x πx 8 q x, x < is require for some in {1,...,D} an not for all s. Thus, settings where some transition kernels q, o not satisfy 8 can still be covere by this theorem provie 8 hols for at least one particular kernel. An equivalent expression of the asymptotic variance σt 2 is σt 2 = V ν {h πh} π D where νx,x = πx α t ν Q x, x. =1 Written in this form, σt 2 turns out to be the expression of the asymptotic variance that appears in the CLT associate with a self-normalize IS algorithm SIS see Section 2.1 for a escription of the algorithm, where the proposal istribution woul be ν an the target istribution π. Obviously, this SIS algorithm cannot be implemente since, given the above efinition of ν, the proposal istribution epens on both π an the weights α t, which are unknown. PROOF OF THEOREM 5.2. Without loss of generality, we assume that πh = 0. Let 0 {1,...,D} be such that πh 2 x πx/q 0 x, x <. A consequence of the proof of Theorem 5.1 is that 1 ω i,t P 1, so we only nee to prove that 9 1 L ω i,t hx i,t 0,σt 2. We will apply Theorem A.2 with U,i = 1 ω i,t hx i,t = 1 πx i,t hx i,t D=1 α t, q x i,t 1,x i,t, G = σ { x i,t 1 1 i,α t, 1 D }. Conitionally on G,thex i,t s 1 i are inepenent an x i,t G D =1 α t, Q x i,t 1,.

19 438 R. DOUC, A. GUILLI, J.-M. MARI AD C. P. ROBERT Conitions i an ii of Theorem A.2 are clearly satisfie. To check conition iii, first note that EU,i G = πh = 0. Moreover, A = EU,i 2 G = 1 h 2 πx x D=1 α t, q x i,t 1,x πx. By the LL for x i,t state in Theorem 5.1, wehave α t, 0 B = 1 h 2 πx x D=1 α t q πx P σt 2 x i,t 1,x. To prove that iii hols, it is thus sufficient to show that B A P 0. Since P α t 0 > 0, we nee only consier the upper boun I {α t, 0 >2 1 α t 0 } B A I {α t, { sup >2 1 α t 0 } 0 1 D sup 1 D α t α t, α t α t α t, α t 1 j=1 1 j=1 h 2 xπx 2 1 α t 0 q 0 x i,t 1,x h 2 xπx D=1 α t, q x i,t 1,x πx } πx P 0. Thus, conition iii is satisfie. Finally, we consier conition iv. Using the same argument as was use for conition iii, wehavethat I {α t, >2 1 α t 0 } 0 1 P π [ { 1 E ω2 i,t h2 x i,t I πx i,t hx i,t D=1 α t, q x i,t 1,x i,t >C } G ] { } h 2 πx x 2 1 α t 0 q 0 x i,t 1,x I πxhx 2 1 α t 0 q 0 x i,t 1,x >C πx { } h 2 πx x 2 1 α t 0 q 0 x,x I πxhx 2 1 α t 0 q 0 x,x >C, whichconvergesto0asc tens to infinity. Thus, Theorem A.2 can be applie an the proof of 6 is complete. The proof of 7 follows from a irect application of Theorem A.2, as in the SIR result, by setting U,i = 1 h x i,t an G = σ {x i,t 1 i,ω i,t 1 i }.

20 ADAPTIVE MIXTURES FOR IMPORTACE SAMPLIG Illustrations. In this section, we briefly show how the iterations of the PMC algorithm quickly implement aaptivity towar the most efficient mixture of kernels, through three examples of moerate ifficulty. The R programs are available from the last author s website via an Snw file. EXAMPLE 1. As a first toy example, we take the target π to be the ensity of a five-imensional normal mixture, , i, an an inepenent normal mixture proposal with the same means an variances as 10, but starte with ifferent weights α 2,. ote that this is a very special case of a D-kernel PMC scheme in that the transition kernels of Section 5.1 are then inepenent proposals. In this case, the optimal choice of weights is obviously α = 1/3. In our experiment, we use three Wishart-simulate variances i, with 10, 15 an 7 egrees of freeom, respectively. The starting values α 1, are inicate on the left of Figure 1, which clearly shows the convergence to the optimal FIG. 1. Convergence of the accumulate weights α t, 1 an α t, 1 + α t, 2 for the three-component normal mixture to the optimal values 1/3 an 2/3 represente by otte lines. At each iteration, = 1,000 points were simulate from the D-kernel proposal.

21 440 R. DOUC, A. GUILLI, J.-M. MARI AD C. P. ROBERT values 1/3 an2/3 for the two first accumulate weights in less than ten iterations. Generating more simulate points at each iteration stabilizes the convergence graph, but the primary aim of this example is to exhibit the fast convergence to the true optimal values of the weights. EXAMPLE 2. As a secon toy example, consier the case of a threeimensional normal 3 0, target with covariance matrix = an the mixture of three kernels given by 11 α t, 1 T 2 x i,t 1, 1 + α t, 2 x i,t 1, 2 + α t, 3 x i,t 1, 3, where the i s are ranom Wishart matrices. [The first proposal in the mixture is a prouct of uniimensional Stuent t-istributions with two egrees of freeom, centere at the current values of the components of x i,t 1 an rotate by τ1 iag 1/2.] A compelling feature of this example is that we can visualize the Kullback ivergence on the R 3 simplex in the sense that the ivergence / Eπ, π= E π [log }] 3 πx α q X, X = eα 1,α 2,α 3 =1 can be approximate by a Monte Carlo experiment on a gri of values of α 1,α 2. Figure 2 shows the result of this Monte Carlo experiment base on 25, , simulations an exhibits a minimum ivergence insie the R 3 simplex. Running the Rao Blackwellize D-kernel PMC algorithm from a ranom starting weight α 1,α 2,α 3 always leas to a neighborhoo of the minimum, even though a strict ecrease in the ivergence requires a large value for an a precise convergence to α max necessitates a large number of iterations T. EXAMPLE 3. Our thir example is a contingency table inspire by [1], given here as Table 1. We moel this ataset by a Poisson regression, x ij P expα i + β j, i,j = 0, 1, with α 0 = 0 for ientifiability reasons. We use a flat prior on the parameter θ = α 1,β 0,β 1 an run the PMC D-kernel algorithm with a mixture of ten normal ranom walk proposals, θ i,t 1,ϱ Iˆθ, = 1,...,10, where Iˆθ is the Fisher information matrix evaluate at the MLE, ˆθ = 0.43, 4.06, 5.9, an where the scales ϱ vary from 1.35e 19 to 1.54e+07 the ϱ s are equiistribute on a logarithmic scale. The results of five successive iterations of the Rao Blackwell D-kernel algorithm are as follows: unsurprisingly, the largest variance

22 ADAPTIVE MIXTURES FOR IMPORTACE SAMPLIG 441 FIG. 2. Grey level an contour representation of the Kullback ivergence eα 1,α 2,α 3 between the 3 0, istribution an the three-component mixture proposal 11. The arker pixels correspon to lower values of the ivergence. We also represent in white the path of one run of the D-kernel PMC algorithm when starte from a ranom value α 1,α 2,α 3. The number of iterations T is equal to 500, while the sample size is 50,000. kernels are harly ever sample, but fulfill their main role of variance stabilizers in the importance sampling weights while the mixture concentrates on the meium variances, with a quick convergence of the mixture weights to the limiting weights the accumulate weights of the 5th, 6th, 7th an 8th components of the mixture converge to 0, 0.003, an 0.738, respectively. The fit of the simulate sample to the target istribution is shown in Figure 3, since the points of the sample o coincie with the unique moal region of the posterior istribution. This experiment also shows that there is no egeneracy in the samples prouce: most points in the last sample have very similar posterior values. For instance, 20% of the sample correspons to 95% of the weights, while 1% of the sample TABLE 1 Two-by-two contingency table 0 1 Total Total

23 442 R. DOUC, A. GUILLI, J.-M. MARI AD C. P. ROBERT FIG. 3. Distribution of 5,000 resample points after five iterations of the Rao Blackwellize D-kernel PMC sampler for the contingency table example. Top: histograms of the components α 1, β 0 an β 1 ; bottom: scatterplots of the points α 1,β 0, α 1,β 1 an β 0,β 1 on the profile slices of the log-likelihoo.

24 ADAPTIVE MIXTURES FOR IMPORTACE SAMPLIG 443 correspons to 31% of the weights. A closer look at convergence is provie by Figure 4, where the histograms of the resample samples are represente, along with the istribution of the log likelihoo an the empirical cumulative istribution function cf of the importance weights. They o not signal any egeneracy phenomenon, but, rather the opposite a clear stabilization aroun the values of interest. 7. Conclusion an perspectives. This paper shows that it is possible to buil an aaptive mixture of proposals aime at a minimization of the Kullback ivergence with the istribution of interest. We can therefore set ifferent goals for a simulation experiment an expect to arrive at the most accurate proposal. For instance, in a companion paper [9], we also erive an aaptive upate of the weights targete at the minimal variance proposal for a given integral I of interest. Rather naturally, these results are achieve uner strong restrictions on the family of proposals in the sense that the parameterization of those families is restricte to the weights of the mixture. It is, however, possible to exten the above results to general mixtures of parameterize proposals, as shown by work currently uner evelopment. A more practical irection of research is the implementation of such aaptive algorithms in large-imensional problems. While our algorithms are in fine importance sampling algorithms, it is conceivable that mixtures of Gibbs-like proposals can take better avantage of the intuition gaine from MCMC methoology, while keeping the finite-horizon valiation of importance sampling methos. The major ifficulty in this irection, however, is that the curse of imensionality still hols in the sense that a we nee to simultaneously consier more an more proposals as the imension increases as, e.g., the set of all full conitionals an b the number of parameters to tune in the proposals exponentially increases with the imension. APPEDIX A: COVERGECE THEOREMS FOR TRIAGULAR ARRAYS In this section, we recall convergence results for triangular arrays of ranom variables see [4]or[10] for more etails, incluing the proofs. We will use these results to stuy the asymptotic behavior of the PMC algorithm. In what follows, let {U,i } 1,1 i be a triangular array of ranom variables efine on the same measurable space, A an let {G } 1 be a sequence of σ -algebras inclue in A. The symbol X P a means that X converges in probability to a as goes to infinity. DEFIITIO A.1. The sequence {U,i } 1,1 i is inepenent given {G } 1 if for all 1, the ranom variables U,1,...,U, are inepenent given G.

25 444 R. DOUC, A. GUILLI, J.-M. MARI AD C. P. ROBERT FIG. 4. Evolution of the samples over four iterations of the Rao Blackwellize D-kernel PMC sampler for the contingency table example the output from each iteration is a block of four graphs, to be rea from left to right an from top to bottom: histograms of the resample samples of α 1, β 0 an β 1 of size 50,000 an lower right of each block log-likelihoo an the empirical cumulative istribution function of the importance weights.

26 ADAPTIVE MIXTURES FOR IMPORTACE SAMPLIG 445 DEFIITIO A.2. The sequence of variables {Z } 1 is boune in probability if lim sup P[ Z C]=0. C 1 THEOREM A.1. If i {U,i } 1,1 i is inepenent given {G } 1 ; ii the sequence { E[ U,i G ]} 1 is boune in probability; iii for all η>0, E[ U,i I U,i >η G ] P 0, then U,i E[U,i G ] P 0. THEOREM A.2. If i {U,i } 1,1 i is inepenent given {G } 1 ; ii for all 1, i {1,...,}, E[ U,i G ] < ; iii there exists σ 2 > 0 such that E[U 2,i G ] E[U,i G ] 2 P σ 2 ; iv for all η>0, E[U 2,i I U,i >η G ] P 0, then for all u R, [ ] G E exp iu U,i E[U,i G ] P exp u2 σ 2. 2 APPEDIX B: PROOF OF PROPOSITIO 3.1 We procee by inuction with respect to t. Using Theorem A.1, the case t = 0 is straightforwar as a irect consequence of the convergence of the importance sampling algorithm. For t 1, let us assume that the LL hols for t 1. Then to prove that ω i,t hx i,t,k i,t converges in probability to π γ u h, we nee only check that 1 πx i,t q Ki,t x i,t 1,x i,t hx i,t,k i,t P π γ u h, the special case h 1 proviing the convergence of the normalizing constant for the importance weights. Applying Theorem A.1 with an U,i = 1 πx i,t q Ki,t x i,t 1,x i,t hx i,t,k i,t G = σ { x i,t 1 1 i,α t, 1 D },

27 446 R. DOUC, A. GUILLI, J.-M. MARI AD C. P. ROBERT where σ {X i i } enotes the σ -algebra inuce by the X i s, we nee only check conition iii. For any C>0, we have 12 1 [ πx i,t E q Ki,t x i,t 1,x i,t { } ] πx i,t hx i,t,k i,t I q Ki,t x i,t 1,x i,t hx i,t,k i,t >C G D = 1 F C x i,t 1,α t,, =1 where F C x, k = πuhu, ki{ πu q k x,u hu, k C}. By inuction, we have α t, = 1 Using these limits in 12 yiels 1 ω i,t 1 I K i,t 1 P 1/D F C x i,t 1,k P πf C, k. an [ { } ] πxi,t hx i,t,k i,t E q Ki,t x i,t 1,x i,t I πxi,t hx i,t,k i,t q Ki,t x i,t 1,x i,t >C G P π γ u F C. Since π γ u F C converges to 0 as C goes to infinity, this proves that for any η>0, 1 [ { } ] πxi,t hx i,t,k i,t E q Ki,t x i,t 1,x i,t I πxi,t hx i,t,k i,t q Ki,t x i,t 1,x i,t >η G P 0. Conition iii is satisfie an Theorem A.1applies. The proof follows. Acknowlegments. The authors are grateful to Olivier Cappé, Paul Fearnhea an Eric Moulines for helpful comments an iscussions. Comments from two referees helpe consierably in improving the focus an presentation of our results.

28 ADAPTIVE MIXTURES FOR IMPORTACE SAMPLIG 447 REFERECES [1] AGRESTI, A Categorical Data Analysis, 2n e. Wiley, ew York. MR [2] ADRIEU, C. an ROBERT, C Controlle Markov chain Monte Carlo methos for optimal sampling. Technical Report 0125, Univ. Paris Dauphine. [3] CAPPÉ, O., GUILLI, A., MARI, J. an ROBERT, C Population Monte Carlo. J. Comput. Graph. Statist MR [4] CAPPÉ, O., MOULIES, E. anrydé, T Inference in Hien Markov Moels. Springer, ew York. MR [5] CELEUX, G., MARI, J. an ROBERT, C Iterate importance sampling in missing ata problems. Comput. Statist. Data Anal [6] CHOPI, Central limit theorem for sequential Monte Carlo methos an its application to Bayesian inference. Ann. Statist MR [7] CSISZÁR, I. an TUSÁDY, G Information geometry an alternating minimization proceures. Recent results in estimation theory an relate topics. Statist. Decisions 1984 suppl MR [8] DEL MORAL, P.,DOUCET, A.anJASRA, A Sequential Monte Carlo samplers. J. R. Stat. Soc. Ser. B Stat. Methool MR [9] DOUC, R.,GUILLI, A.,MARI, J.anROBERT, C Minimum variance importance sampling via population Monte Carlo. Technical report, Cahiers u CEREMADE, Univ. Paris Dauphine. [10] DOUC, R. an MOULIES, E Limit theorems for properly weighte samples with applications to sequential Monte Carlo. Technical report, TSI, Telecom Paris. [11] DOUCET, A.,DE FREITAS,.anGORDO,., es Sequential Monte Carlo Methos in Practice. Springer, ew York. MR [12] GILKS, W., ROBERTS, G. an SAHU, S Aaptive Markov chain Monte Carlo through regeneration. J. Amer. Statist. Assoc MR [13] HAARIO, H., SAKSMA, E. an TAMMIE, J Aaptive proposal istribution for ranom walk Metropolis algorithm. Comput. Statist [14] HAARIO, H., SAKSMA, E. an TAMMIE, J An aaptive Metropolis algorithm. Bernoulli MR [15] HESTERBERG, T Weighte average importance sampling an efensive mixture istributions. Technometrics [16] IBA, Y Population-base Monte Carlo algorithms. Trans. Japanese Society for Artificial Intelligence [17] KÜSCH, H Recursive Monte Carlo filters: Algorithms an theoretical analysis. Ann. Statist MR [18] MEGERSE, K. L. an TWEEDIE, R. L Rates of convergence of the Hastings an Metropolis algorithms. Ann. Statist MR [19] ROBERT, C Intrinsic losses. Theory an Decision MR [20] ROBERT, C. an CASELLA, G Monte Carlo Statistical Methos, 2n e. Springer, ew York. MR [21] ROBERTS, G. O., GELMA, A. an GILKS, W. R Weak convergence an optimal scaling of ranom walk Metropolis algorithms. Ann. Appl. Probab MR [22] RUBI, D Using the SIR algorithm to simulate posterior istributions. In Bayesian Statistics 3 J. M. Bernaro, M. H. DeGroot, D. V. Linley an A. F. M. Smith, es Oxfor Univ. Press. [23] SAHU, S. an ZHIGLJAVSKY, A Aaptation for self regenerative MCMC. Technical report, Univ. of Wales, Cariff.

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

WEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada

WEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada WEIGHTIG A RESAMPLED PARTICLE I SEQUETIAL MOTE CARLO L. Martino, V. Elvira, F. Louzaa Dep. of Signal Theory an Communic., Universia Carlos III e Mari, Leganés (Spain). Institute of Mathematical Sciences

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Adaptive Population Monte Carlo

Adaptive Population Monte Carlo Adaptive Population Monte Carlo Olivier Cappé Centre Nat. de la Recherche Scientifique & Télécom Paris 46 rue Barrault, 75634 Paris cedex 13, France http://www.tsi.enst.fr/~cappe/ Recent Advances in Monte

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Introduction. A Dirichlet Form approach to MCMC Optimal Scaling. MCMC idea

Introduction. A Dirichlet Form approach to MCMC Optimal Scaling. MCMC idea Introuction A Dirichlet Form approach to MCMC Optimal Scaling Markov chain Monte Carlo (MCMC quotes: Metropolis et al. (1953, running coe on the Los Alamos MANIAC: a feasible approach to statistical mechanics

More information

arxiv: v1 [stat.co] 23 Oct 2007

arxiv: v1 [stat.co] 23 Oct 2007 Aaptive Importance Sampling in General Mixture Classes arxiv:0710.4242v1 stat.co] 23 Oct 2007 Olivier Cappé, LTCI, ENST & CNRS, Paris Ranal Douc, INT, Evry Arnau Guillin, École Centrale & LATP, CNRS, Marseille

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Tractability results for weighted Banach spaces of smooth functions

Tractability results for weighted Banach spaces of smooth functions Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

1. Aufgabenblatt zur Vorlesung Probability Theory

1. Aufgabenblatt zur Vorlesung Probability Theory 24.10.17 1. Aufgabenblatt zur Vorlesung By (Ω, A, P ) we always enote the unerlying probability space, unless state otherwise. 1. Let r > 0, an efine f(x) = 1 [0, [ (x) exp( r x), x R. a) Show that p f

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

WELL-POSEDNESS OF A POROUS MEDIUM FLOW WITH FRACTIONAL PRESSURE IN SOBOLEV SPACES

WELL-POSEDNESS OF A POROUS MEDIUM FLOW WITH FRACTIONAL PRESSURE IN SOBOLEV SPACES Electronic Journal of Differential Equations, Vol. 017 (017), No. 38, pp. 1 7. ISSN: 107-6691. URL: http://eje.math.txstate.eu or http://eje.math.unt.eu WELL-POSEDNESS OF A POROUS MEDIUM FLOW WITH FRACTIONAL

More information

Logarithmic spurious regressions

Logarithmic spurious regressions Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate

More information

Monotonicity for excited random walk in high dimensions

Monotonicity for excited random walk in high dimensions Monotonicity for excite ranom walk in high imensions Remco van er Hofsta Mark Holmes March, 2009 Abstract We prove that the rift θ, β) for excite ranom walk in imension is monotone in the excitement parameter

More information

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments Problem F U L W D g m 3 2 s 2 0 0 0 0 2 kg 0 0 0 0 0 0 Table : Dimension matrix TMA 495 Matematisk moellering Exam Tuesay December 6, 2008 09:00 3:00 Problems an solution with aitional comments The necessary

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

WEIGHTING A RESAMPLED PARTICLES IN SEQUENTIAL MONTE CARLO (EXTENDED PREPRINT) L. Martino, V. Elvira, F. Louzada

WEIGHTING A RESAMPLED PARTICLES IN SEQUENTIAL MONTE CARLO (EXTENDED PREPRINT) L. Martino, V. Elvira, F. Louzada WEIGHTIG A RESAMLED ARTICLES I SEQUETIAL MOTE CARLO (ETEDED RERIT) L. Martino, V. Elvira, F. Louzaa Dep. of Signal Theory an Communic., Universia Carlos III e Mari, Leganés (Spain). Institute of Mathematical

More information

A Modification of the Jarque-Bera Test. for Normality

A Modification of the Jarque-Bera Test. for Normality Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

LOGISTIC regression model is often used as the way

LOGISTIC regression model is often used as the way Vol:6, No:9, 22 Secon Orer Amissibilities in Multi-parameter Logistic Regression Moel Chie Obayashi, Hiekazu Tanaka, Yoshiji Takagi International Science Inex, Mathematical an Computational Sciences Vol:6,

More information

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay

More information

Introduction to Markov Processes

Introduction to Markov Processes Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Generalized Tractability for Multivariate Problems

Generalized Tractability for Multivariate Problems Generalize Tractability for Multivariate Problems Part II: Linear Tensor Prouct Problems, Linear Information, an Unrestricte Tractability Michael Gnewuch Department of Computer Science, University of Kiel,

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

26.1 Metropolis method

26.1 Metropolis method CS880: Approximations Algorithms Scribe: Dave Anrzejewski Lecturer: Shuchi Chawla Topic: Metropolis metho, volume estimation Date: 4/26/07 The previous lecture iscusse they some of the key concepts of

More information

Discrete Mathematics

Discrete Mathematics Discrete Mathematics 309 (009) 86 869 Contents lists available at ScienceDirect Discrete Mathematics journal homepage: wwwelseviercom/locate/isc Profile vectors in the lattice of subspaces Dániel Gerbner

More information

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7. Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency

More information

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation Relative Entropy an Score Function: New Information Estimation Relationships through Arbitrary Aitive Perturbation Dongning Guo Department of Electrical Engineering & Computer Science Northwestern University

More information

Lecture 5. Symmetric Shearer s Lemma

Lecture 5. Symmetric Shearer s Lemma Stanfor University Spring 208 Math 233: Non-constructive methos in combinatorics Instructor: Jan Vonrák Lecture ate: January 23, 208 Original scribe: Erik Bates Lecture 5 Symmetric Shearer s Lemma Here

More information

All s Well That Ends Well: Supplementary Proofs

All s Well That Ends Well: Supplementary Proofs All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee

More information

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim QF101: Quantitative Finance September 5, 2017 Week 3: Derivatives Facilitator: Christopher Ting AY 2017/2018 I recoil with ismay an horror at this lamentable plague of functions which o not have erivatives.

More information

arxiv: v2 [math.pr] 27 Nov 2018

arxiv: v2 [math.pr] 27 Nov 2018 Range an spee of rotor wals on trees arxiv:15.57v [math.pr] 7 Nov 1 Wilfrie Huss an Ecaterina Sava-Huss November, 1 Abstract We prove a law of large numbers for the range of rotor wals with ranom initial

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

arxiv: v4 [math.pr] 27 Jul 2016

arxiv: v4 [math.pr] 27 Jul 2016 The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,

More information

Group Importance Sampling for particle filtering and MCMC

Group Importance Sampling for particle filtering and MCMC Group Importance Sampling for particle filtering an MCMC Luca Martino, Víctor Elvira, Gustau Camps-Valls Image Processing Laboratory, Universitat e València (Spain). IMT Lille Douai CRISTAL (UMR 989),

More information

A Sketch of Menshikov s Theorem

A Sketch of Menshikov s Theorem A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p

More information

A simple model for the small-strain behaviour of soils

A simple model for the small-strain behaviour of soils A simple moel for the small-strain behaviour of soils José Jorge Naer Department of Structural an Geotechnical ngineering, Polytechnic School, University of São Paulo 05508-900, São Paulo, Brazil, e-mail:

More information

Level Construction of Decision Trees in a Partition-based Framework for Classification

Level Construction of Decision Trees in a Partition-based Framework for Classification Level Construction of Decision Trees in a Partition-base Framework for Classification Y.Y. Yao, Y. Zhao an J.T. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canaa S4S

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

On a limit theorem for non-stationary branching processes.

On a limit theorem for non-stationary branching processes. On a limit theorem for non-stationary branching processes. TETSUYA HATTORI an HIROSHI WATANABE 0. Introuction. The purpose of this paper is to give a limit theorem for a certain class of iscrete-time multi-type

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

Range and speed of rotor walks on trees

Range and speed of rotor walks on trees Range an spee of rotor wals on trees Wilfrie Huss an Ecaterina Sava-Huss May 15, 1 Abstract We prove a law of large numbers for the range of rotor wals with ranom initial configuration on regular trees

More information

ON MULTIPLE TRY SCHEMES AND THE PARTICLE METROPOLIS-HASTINGS ALGORITHM

ON MULTIPLE TRY SCHEMES AND THE PARTICLE METROPOLIS-HASTINGS ALGORITHM ON MULTIPLE TRY SCHEMES AN THE PARTICLE METROPOLIS-HASTINGS ALGORITHM L. Martino, F. Leisen, J. Coraner University of Helsinki, Helsinki (Finlan). University of Kent, Canterbury (UK). ABSTRACT Markov Chain

More information

INDEPENDENT COMPONENT ANALYSIS VIA

INDEPENDENT COMPONENT ANALYSIS VIA INDEPENDENT COMPONENT ANALYSIS VIA NONPARAMETRIC MAXIMUM LIKELIHOOD ESTIMATION Truth Rotate S 2 2 1 0 1 2 3 4 X 2 2 1 0 1 2 3 4 4 2 0 2 4 6 4 2 0 2 4 6 S 1 X 1 Reconstructe S^ 2 2 1 0 1 2 3 4 Marginal

More information

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu

More information

Self-normalized Martingale Tail Inequality

Self-normalized Martingale Tail Inequality Online-to-Confience-Set Conversions an Application to Sparse Stochastic Banits A Self-normalize Martingale Tail Inequality The self-normalize martingale tail inequality that we present here is the scalar-value

More information

Calculus and optimization

Calculus and optimization Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

A. Exclusive KL View of the MLE

A. Exclusive KL View of the MLE A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function

More information

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix

More information

under the null hypothesis, the sign test (with continuity correction) rejects H 0 when α n + n 2 2.

under the null hypothesis, the sign test (with continuity correction) rejects H 0 when α n + n 2 2. Assignment 13 Exercise 8.4 For the hypotheses consiere in Examples 8.12 an 8.13, the sign test is base on the statistic N + = #{i : Z i > 0}. Since 2 n(n + /n 1) N(0, 1) 2 uner the null hypothesis, the

More information

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite

More information

Sharp Thresholds. Zachary Hamaker. March 15, 2010

Sharp Thresholds. Zachary Hamaker. March 15, 2010 Sharp Threshols Zachary Hamaker March 15, 2010 Abstract The Kolmogorov Zero-One law states that for tail events on infinite-imensional probability spaces, the probability must be either zero or one. Behavior

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Lecture 6: Calculus. In Song Kim. September 7, 2011

Lecture 6: Calculus. In Song Kim. September 7, 2011 Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear

More information

Make graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides

Make graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides Reference 1: Transformations of Graphs an En Behavior of Polynomial Graphs Transformations of graphs aitive constant constant on the outsie g(x) = + c Make graph of g by aing c to the y-values on the graph

More information

TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE

TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS Yannick DEVILLE Université Paul Sabatier Laboratoire Acoustique, Métrologie, Instrumentation Bât. 3RB2, 8 Route e Narbonne,

More information

How to Minimize Maximum Regret in Repeated Decision-Making

How to Minimize Maximum Regret in Repeated Decision-Making How to Minimize Maximum Regret in Repeate Decision-Making Karl H. Schlag July 3 2003 Economics Department, European University Institute, Via ella Piazzuola 43, 033 Florence, Italy, Tel: 0039-0-4689, email:

More information

Monte Carlo Methods with Reduced Error

Monte Carlo Methods with Reduced Error Monte Carlo Methos with Reuce Error As has been shown, the probable error in Monte Carlo algorithms when no information about the smoothness of the function is use is Dξ r N = c N. It is important for

More information

New Simple Controller Tuning Rules for Integrating and Stable or Unstable First Order plus Dead-Time Processes

New Simple Controller Tuning Rules for Integrating and Stable or Unstable First Order plus Dead-Time Processes Proceeings of the 3th WSEAS nternational Conference on SYSTEMS New Simple Controller Tuning Rules for ntegrating an Stable or Unstable First Orer plus Dea-Time Processes.G.ARVANTS Department of Natural

More information

Extreme Values by Resnick

Extreme Values by Resnick 1 Extreme Values by Resnick 1 Preliminaries 1.1 Uniform Convergence We will evelop the iea of something calle continuous convergence which will be useful to us later on. Denition 1. Let X an Y be metric

More information

Expected Value of Partial Perfect Information

Expected Value of Partial Perfect Information Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo

More information

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1 Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of

More information

05 The Continuum Limit and the Wave Equation

05 The Continuum Limit and the Wave Equation Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,

More information

Quantum mechanical approaches to the virial

Quantum mechanical approaches to the virial Quantum mechanical approaches to the virial S.LeBohec Department of Physics an Astronomy, University of Utah, Salt Lae City, UT 84112, USA Date: June 30 th 2015 In this note, we approach the virial from

More information

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods Hyperbolic Moment Equations Using Quarature-Base Projection Methos J. Koellermeier an M. Torrilhon Department of Mathematics, RWTH Aachen University, Aachen, Germany Abstract. Kinetic equations like the

More information

THE GENUINE OMEGA-REGULAR UNITARY DUAL OF THE METAPLECTIC GROUP

THE GENUINE OMEGA-REGULAR UNITARY DUAL OF THE METAPLECTIC GROUP THE GENUINE OMEGA-REGULAR UNITARY DUAL OF THE METAPLECTIC GROUP ALESSANDRA PANTANO, ANNEGRET PAUL, AND SUSANA A. SALAMANCA-RIBA Abstract. We classify all genuine unitary representations of the metaplectic

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners Lower Bouns for Local Monotonicity Reconstruction from Transitive-Closure Spanners Arnab Bhattacharyya Elena Grigorescu Mahav Jha Kyomin Jung Sofya Raskhonikova Davi P. Wooruff Abstract Given a irecte

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

Non-Linear Bayesian CBRN Source Term Estimation

Non-Linear Bayesian CBRN Source Term Estimation Non-Linear Bayesian CBRN Source Term Estimation Peter Robins Hazar Assessment, Simulation an Preiction Group Dstl Porton Down, UK. probins@stl.gov.uk Paul Thomas Hazar Assessment, Simulation an Preiction

More information

Final Exam Study Guide and Practice Problems Solutions

Final Exam Study Guide and Practice Problems Solutions Final Exam Stuy Guie an Practice Problems Solutions Note: These problems are just some of the types of problems that might appear on the exam. However, to fully prepare for the exam, in aition to making

More information