YVES F. ATCHADÉ, GARETH O. ROBERTS, AND JEFFREY S. ROSENTHAL
|
|
- Miles Atkins
- 5 years ago
- Views:
Transcription
1 TOWARDS OPTIMAL SCALING OF METROPOLIS-COUPLED MARKOV CHAIN MONTE CARLO YVES F. ATCHADÉ, GARETH O. ROBERTS, AND JEFFREY S. ROSENTHAL Abstract. We consier optimal temperature spacings for Metropolis-couple Markov chain Monte Carlo (MCMCMC) an Simulate Tempering algorithms. We prove that, uner certain conitions, it is optimal (in terms of maximising the expecte square jumping istance) to space the temperatures so that the proportion of temperature swaps which are accepte is approximately This generalises relate work by physicists, an is consistent with previous work about optimal scaling of ranom-walk Metropolis algorithms. 1. Introuction The Metropolis-couple Markov chain Monte Carlo (MCMCMC) algorithm [7], also known as parallel tempering or the replica exchange metho, is a version of the Metropolis-Hastings algorithm [17, 9] which is very effective in ealing with problems of multi-moality. The algorithm works by simulating multiple copies of the target (stationary) istribution, each at a ifferent temperature. Through a swap move, the algorithm allows copies at lower temperatures to borrow information from high-temperature copies an thus escape from local moes. The performance of MCMCMC epens crucially on the temperatures use for the ifferent copies. There is an ongoing iscussion (especially in the Physics literature) on how to best select these temperatures. In this note, we show that this question can be partially aresse using the optimal scaling framework initiate in [20]. MCMCMC arises as follows. We are intereste in sampling from a target probability istribution π( ) having (complicate, probably multimoal, an possibly un-normalise) target ensity f (x) on some state space X, which is an open subset of R for some (large) imension. We efine a sequence of associate tempere istributions f (β j) (x), where 0 β n < β n 1 <... < β 1 < β 0 = 1 are the selecte inverse temperature values, subject to the restriction that f (β 0) (x) = f (1) (x) = f (x). (Usually the ensities f (β j) are simply powers of the original ensity, i.e. f (β j) (x) = (f (x)) β j ; in this case, if X has infinite volume, then we require β n > 0.) Date: July 30, 2009; last revise April Mathematics Subject Classification. 60J10, 65C05. Key wors an phrases. Metropolis-Couple MCMC, Simulate tempering, Optimal scaling. Y. Atchaé: Department of Statistics, University of Michigan, 1085 South University, Ann Arbor, 48109, MI, Unite States. aress: yvesa@umich.eu. G. O. Roberts: Department of Statistics, University of Warwick, Coventry, CV4 7AL UK. aress: gareth.o.roberts@warwick.ac.uk. J. S. Rosenthal: Department of Statistics, University of Toronto, Toronto, Ontario, Canaa, M5S 3G3. aress: jeff@math.toronto.eu. Partially supporte by NSERC of Canaa. 1
2 2 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL MCMCMC procees by running one chain at each of the n + 1 values of β. It has state space X n+1, with unnormalise stationary measure n j=0 f (β j) (x j ), where each x j correspons to a chain at the fixe inverse temperature β j having stationary ensity f (β j) (x j ). In particular, the β 0 chain has stationary ensity f (β 0) (x 0 ) = f (x 0 ), so that after a long run the samples x 0 shoul approximately correspon to the ensity f (x) which is the ensity of actual interest. The hope is that the chains corresponing to smaller β (i.e., to higher temperatures) can mix more easily, an then can len this mixing information to the β 0 chain, thus speeing up convergence of the x 0 values to the ensity f (x). To achieve this, MCMCMC alternates between two ifferent types of ynamics. On some iterations, it attempts a within temperature move, by upating each x j accoring to some type of MCMC upate (e.g., a usual ranom-walk Metropolis algorithm) for which f (β j) is a stationary ensity. On other iterations, it attempts a temperature swap, which consists of choosing two ifferent inverse temperatures, say β j an β k, an then proposing to swap their respective chain values, i.e. to interchange the current values of x j an x k. This propose swap is then accepte accoring to the usual (symmetric) Metropolis algorithm probabilities, i.e. it is accepte with probability ( min 1, f (β j) (x k ) f (β k) (x j ) ), (1) f (β j) (x j ) f (β k) (x k ) otherwise it is rejecte an the values of x are left unchange. Such algorithms lea to many interesting questions an have been wiely stuie, see e.g. [7, 11, 12, 18, 6, 13, 15, 5, 26, 27]. This note will concentrate on the specific question of optimising the choice of the β values to achieve maximal efficiency in the temperature swaps. Specifically, suppose we wish to esign the algorithm such that the chain at inverse temperature β will propose to swap with another chain at inverse temperature β + ɛ. What choice of ɛ is best? Obviously, if ɛ is very large, then such swaps will usually be rejecte. Similarly, if ɛ is very small, then such swaps will usually be accepte, but will not greatly improve mixing. Hence, the optimal ɛ is somewhere between the two extremes (this is sometimes calle the Golilocks Principle ), an our task is to ientify it. Our main result (Theorem 1) will show that, in a certain limiting sense uner certain assumptions, it is optimal (in terms of maximising the expecte square jumping istance of the influence of hotter chains on coler ones) to choose the spacing ɛ such that the probability of such a swap being accepte is equal to This is the same optimal acceptance probability erive previously for certain ranomwalk Metropolis algorithms [20, 22], an generalises some results in the physics literature [11, 18]. It also has connections (Section 5.1) to the Dirichlet form of an associate temperature process. We shall also consier (Section 4) the relate Simulate Tempering algorithm, an shall prove that uner certain conitions the optimal acceptance rate applies there as well. We shall also compare (Corollary 1) Simulate Tempering to MCMCMC, an see that in a certain sense the former is exactly twice as efficient as the latter, but there are various mitigating factors so the comparison is far from clear-cut.
3 OPTIMAL SCALING OF MCMCMC Toy Example. To illustrate the importance of temperature spacings, an of measuring the influence of hotter chains on coler ones, we consier a very simple toy example. Suppose the state space consists of just 101 iscrete points, X = {0, 1, 2, 3,..., 100}, with (un-normalise) target ensity (with respect to counting measure on X ) given by π{x} = 2 x + 2 (100 x) for x X. That is, π has two moes, at 0 an 100, with a virtually insurmountable barrier of low probability states between them (Figure 1, top-left). Let the tempere ensities be given (as usual) ( β. by powers of π, i.e. f (β) (x) = π(x)) Thus, for very small β, the insurmountable barrier is nicely remove (Figure 1, top-mile). Can we make use of such tempere ensities to improve convergence of the original chain to π? To be specific, suppose each fixe-β chain is a ranom-walk Metropolis algorithm with proposal kernel q(x, x + 1) = q(x, x 1) = 1/2. Thus, the col (large β) chains are virtually incapable of moving from state 0 to state 100 or back (Figure 1, topleft), but the hot (small β) chains have no such obstacle (Figure 1, top-mile). The question is, what values of the inverse temperatures β will best allow an MCMCMC algorithm to benefit from the rapi mixing of the hot chains, to provie goo mixing for the col chain? Figure 1: The toy example s target ensity (top-left), tempere ensity at inverse-temperature β = (top-mile), ranom-walk Metropolis run (top-right), an trace plots of the col chain over 10,000 full iterations of MCMCMC runs with two temperatures (bottom-left), ten temperatures (bottom-mile), an fifty temperatures (bottom-right).
4 4 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL To test this, we ran an MCMCMC algorithm for 10,000 full iterations (each consisting of one upate at each inverse-temperature, plus one attempte swap of ajacent inverse-temperature values), in each of four settings: with just one inversetemperature, i.e. an orinary MCMC algorithm with no temperature swaps (Figure 1, top-right); with two inverse-temperatures, 1 an (Figure 1, bottom-left); with ten inverse-temperatures, j/9 for j = 0, 1, 2,..., 9 (Figure 1, bottommile); an with fifty inverse-temperatures, j/49 for j = 0, 1, 2,..., 49 (Figure 1, bottom-mile). We starte each run with all chains at the state 0, to investigate the extent to which they were able to mix well an effectively sample from the equally-large moe at state 100. The results of our runs were that the orinary MCMC algorithm with no temperature swaps i not traverse the barrier at all, an just staye near the state 0 (Figure 1, top-right). Aing one aitional temperature improve this somewhat (Figure 1, bottom-left), an the col chain was now able to sometimes sample from states near 100, but only occasionally. Using ten temperatures improve this greatly an le to quite goo mixing of the col chain (Figure 1, bottom-mile). Perhaps most interestingly, using lots more temperatures (fifty) i not improve mixing further, but actually mae it much worse (Figure 1, bottom-right), espite the greatly increase computation time (since each temperature requires its own chain values to be upate at each iteration). So, we see that in this example, ten geometrically-space temperatures performe much better than two temperatures (too few) or fifty temperatures (too many). Intuitively, with only two temperatures, it is too ifficult for the algorithm to effectively swap values between ajacent temperatures (inee, only about 5% of propose swaps were accepte). By contrast, with fifty temperatures, virtually all swaps are accepte, but too many swaps are require before the rapily-mixing hot chains can have much influence over the values of the col chain, so this again oes not lea to an efficient algorithm. Best is a compromise choice, ten temperatures, which makes the temperature ifferences small enough so swaps are easily accepte, but still large enough that they lea to efficient interchange between hot an col chains an thus to efficient convergence of the col chain to π. That is, we woul like the hot an col chains values to be able to influence each other effectively, which requires lots of successful swaps of significant istance in the inverse-temperature omain. This leas to the question of how to select the number an spacing of the inversetemperature values for a given problem, which is the topic of this paper. To maximise the influence of the hot chain on the col chain, we want to maximise the effective spee with which the chain values move along in the inverse-temperature omain. We shall o this by means of the expecte square jumping istance (ESJD) of this influence, efine below. Our main result (Theorem 1 below) says that to maximise ESJD, it is optimal (uner certain assumptions) to select the inverse-temperatures so that the probability of accepting a propose swap between ajacent inversetemperature values is approximately 23%. 2. Optimal Temperature Spacings for MCMCMC In this section, we consier the question of optimal temperature choice for MCM- CMC algorithms. To fix ieas, we consier the specific situation in which the algorithm is proposing to swap the chain values at two specific inverse temperatures,
5 OPTIMAL SCALING OF MCMCMC 5 namely β an β + ɛ, where β, ɛ > 0 an β + ɛ 1. We shall consier the question of what value of ɛ is optimal in terms of maximising the influence of the hot chain on the col chain. We shall focus on the asymptotic situation in which the chain has alreay converge to its target stationary istribution, i.e. that x n j=0 f (β j). To measure this precisely, let us efine γ = β + ɛ if the swap is accepte, or γ = β if the swap is rejecte. Then γ is an inication of where the β chain s value has travele to, in the inverse temperature space. That is, if the propose swap is accepte, then the value moves from β to γ = β + ɛ for a total inverse-temperature istance of γ β = ɛ, while if the swap is rejecte then it oes not move at all, i.e. the istance it moves is γ β = 0. Hence, γ β inicates the extent to which the swap proposal succeee in moving ifferent x values aroun the β space, which is essential for getting benefit from the MCMCMC algorithm. So, the larger the magnitue of γ β, the more efficient are the swap moves at proviing mixing in the temperature omain. We therefore efine the optimal choice of ɛ to be the one which maximises the stationary (i.e. asymptotic) expecte square jumping istance (ESJD) of this movement, i.e. which maximises ESJD = E π [ (γ β) 2 ] = ɛ 2 E π [P(swap accepte)] ɛ 2 ACC, (2) where from (1), [ ( ACC = E π [P(swap accepte)] = E π min 1, f (β j) (x k ) f (β k) (x j ) ) ] (3) f (β j) (x j ) f (β k) (x k ) an where the expectations are with respect to the stationary (asymptotic) istribution x n j=0 f (β j). We shall also consier the generalisation of (2) to ESJD(h) = E π [ (h(γ) h(β)) 2 ] (4) for some h : [0, 1] R; then (2) correspons to the case where h is the ientity function. Of course, (2) an (4) represent just some possible measures of optimality, an other measures might not necessarily lea to equivalent optimisations; for some iscussion relate to this issue see Section 5. To make progress on computing the optimal ɛ, we restrict (following [20, 22]) to the special case where f (x) = f(x i ), (5) i=1 i.e. the target ensity takes on a special prouct form. (Although (5) is a very restrictive assumption, it is known [20, 22] that conclusions rawn from this special case are often approximately applicable in much broaer contexts.) We also assume that the tempere istributions are simply powers of the original ensity (which is the usual case), i.e. that f (β) ( (x) = f (β) (x i ) f(xi ) ) β. (6) i=1 Intuitively, as the imension gets large, so that small changes in β lea to larger an larger changes in f (β), the inverse-temperature sprea ɛ must ecrease to preserve a non-vanishing probability of accepting a propose swap. Hence, similar i=1
6 6 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL to [20, 22], we shall consier the limit as an corresponingly ɛ 0. To get a non-trivial limit, we take ɛ = 1/2 l (7) for some positive constant l to be etermine. (Choosing a smaller scaling woul correspon to taking l 0, while choosing a larger scaling woul correspon to letting l ; either choice is sub-optimal, since the optimal l will be strictly between 0 an as we shall see.) 2.1. Main Result. Uner these assumptions, we shall prove the following (where Φ(z) = 1 2π z e s2 /2 s is the cf of a stanar normal): Theorem 1. Consier the MCMCMC algorithm as escribe above, assuming (5), (6), an (7). Then( as, the ESJD of (2) is maximize when l is chosen to maximize l 2 2 Φ l ) I(β)/2, where I(β) > 0 is efine in (10) below. Furthermore, for this optimal choice of l, the corresponing probability of accepting a propose swap is given (to three ecimal points) by In fact, the relationship between ESJD of (2) an ACC of (3) is given by ESJD = (2/I(β)) ACC ( Φ 1 (ACC/2) ) 2 (see Figure 2). Finally, the optimal choice of l also maximises ESJD(h) of (4), for any ifferentiable function h : [0, 1] R. (8) Efficiency ESJD Acceptance Rate ACC Figure 2: A graph of the relationship between the expecte square jumping istance (ESJD) an asymptotic acceptance rate (ACC), as escribe in equation (8), in units where I(β) = 1. Proof. For this MCMCMC algorithm, the acceptance probability of a temperature swap is given by α 1 e B
7 where OPTIMAL SCALING OF MCMCMC 7 e B = f (β) f (β) (y)f (β+ɛ) (x)f (β+ɛ) an where x f (β) an y f (β+ɛ) are inepenent an in stationarity. We write g = log f so that g (β) (β) (x) = log f (x), etc. We then have that where B = T (x) = g (β) [ (g (β) (y) g(β+ɛ) (x) g(β+ɛ) (y) ) ( g (β) (x) (y) (x) g(β+ɛ) (x) )] T (y) T (x), (x) = βg (x) (β +ɛ) g (x) = ɛ g (x) = ɛ g(x i ). Now, write E β for expectation with respect to the istribution having ensity proportional to f β, an similarly for Var β. Then in this istribution, g has mean log f(x)f E β β (x)x (g) = f β M(β), (9) (x)x i=1 an variance Var β (g) = (log f(x)) 2 f β (x)x f β (x)x M(β) 2 I(β). (10) Hence, in the istribution proportional to f β, T (x) has mean E β ( T (x) ) = ɛ M(β) µ(β), an variance Var( β T (x) ) = ɛ 2 I(β) σ(β) 2. Now, taking erivative with respect to β (using the quotient rule ), ( ) (log f(x)) M 2 f β (x)x f (β) = β 2 log f(x)x f β (x)x f β = I(β), (11) (x)x from which it follows that µ (β) = ɛ M (β) = ɛ I(β) = σ(β) 2 /ɛ. Now, recall that ɛ = 1/2 l. Hence, T (y) T (x) = l ( g(yi ) g(x i ) ). We claim that, as, T (y) T (x) converges weakly to a ranom variable A N ( σ(β) 2, 2 σ(β) 2 ). To see this, let φ (t) be the characteristic function of T (y) T (x). Then we have: φ (t) = E [ e itɛ(t (y) T (x)) ] = e itɛ(m(β+ɛ) M(β)) { E ( e itɛḡ(y 1) ) E ( e itɛḡ(x 1) )}, where ḡ(x) = g(x) E β (g(x)). Since M(β+ɛ) M(β) = ɛm(β)+o(ɛ) an ɛ = 1/2 l, it follows that lim itɛ(m(β + ɛ) M(β)) = itl 2 I(β). Hence, using a Taylor series expansion, E ( e ) itɛḡ(y 1) E ( e ) ) ) itɛḡ(x 1) = (1 l2 t 2 2 I(β + ɛ) + o( 1 ) (1 l2 t 2 2 I(β) + o( 1 ) i=1
8 8 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL = ) (1 l2 t I(β) + o( 1 ). Hence, φ (t) converges to e itli(β) e l2 t 2I(β) as. This limiting function is the characteristic function of the istribution N( l 2 I(β), 2l 2 I(β)), thus proving the claim. To continue, recall (e.g. [20], Proposition 2.4) that if A N(m, s 2 ), then ( m ) ( E(1 e A ) = Φ + e m+ s2 2 Φ s m ). s s In particular, if A N( c, 2c) (so that E(e A ) = 1), then ( E(1 e A ) = 2 Φ ) c/2. Since the function 1 e B is a boune function, it follows that as, the acceptance rate of MCMCMC converges to E(α) = E(1 e B ) E ( 1 exp ( N( σ(β) 2, 2σ(β) 2 ) )) = 2 Φ( σ(β)/ 2). Now, with ɛ = 1/2 l, ɛ 2 = l 2 an σ(β) 2 /2 = ɛ 2 I(β)/2 = l 2 I(β)/2, whence σ(β)/ 2 = l [I(β)/2] 1/2. Then as, P(accept swap) = E(1 e B ) 2 Φ( l [I(β)/2] 1/2 ), an so ESJD = ɛ 2 P(accept swap) (l 2 /) 2 Φ( l [I(β)/2] 1/2 ) e MC (l). (12) Hence, maximising ESJD is equivalent to choosing l = l opt to maximise l 2 2 Φ( l [I(β)/2] 1/2 ), with the secon factor being the acceptance probability. It then follows as in [20, 22] that when l = l opt, the acceptance probability becomes (to three ecimal places). Inee, making the substitution u = l[i(β)/2] 1/2 shows that fining l opt is equivalent to fining the value û of u which maximizes 8I(β) 2 u 2 Φ( u), an then evaluating â = 2Φ( û). It follows that the value of û, an hence also the value of â, oes not epen on the value of I(β) (provie I(β) > 0). So, it suffices to assume I(β) = 1, in which case we compute numerically that û = , so that â = 2 Φ( ) = = Finally, if instea of ESJD we consier ESJD(h) E[ ( h(γ) h(β) ) 2 ] for some ifferentiable function h, then as ɛ 0 we have ( h(γ) h(β) ) 2 [h (β)] 2 (γ β) 2, so ESJD(h) [h (β)] 2 (ESJD). Hence, as a function of ɛ, maximising ESJD(h) is equivalent to maximising ESJD, an is thus maximise at the same value ɛ opt Iteratively Selecting the Inverse Temperatures. The above results show that, in high imensions uner certain assumptions, it is most efficient to choose the inverse temperatures for MCMCMC such that the average acceptance probability between any two ajacent temperatures is about 23%. This leas to the question of how to select such temperatures. Although this may not be a practical approach in general, we shall aopt an intensive iterative approach to this in orer to assess the effects of our theory in applications. We assume we know (perhaps by running some preliminary simulations) some sufficiently small inverse temperature value β such that the mixing of the chain with corresponing ensity f β is sufficiently fast. This value β shall be our minimal inverse temperature. (In the examples below, we use β = 0.01.)
9 OPTIMAL SCALING OF MCMCMC 9 In terms of β, we construct the remaining β j values iteratively, as follows. First, starting with β 0 = 1, we fin β 1 such that the acceptance probability of a swap between β 0 = 1 an β 1 is approximately Then, given β 1, we fin β 2 such that the acceptance probability of the swap between β 1 an β 2 is again approximately We continue in this way to construct β 3, β 4,..., continuing until we have β j β for some j. At that point, we replace β j by β an stop. To implement this iterative algorithm, we nee to fin for each inverse-temperature β a corresponing inverse-temperature β < β such that the average acceptance probability of the swaps between β an β is approximately We use a simulationbase approach for this, via ranom variables {(X n, X n, ρ n )} n 0. After initialisations for n = 0, then at each time n 1, we let β n = β(1 + e ρn ) 1, an raw X n+1 f β an X n+1 f β n (or upate them from some Markov chain ynamics preserving those istributions). The probability of accepting a swap between β an β n is then given by α n+1 1 e B n+1 where B n+1 = (β n β) ( g (X n+1) g (X n+1 ) ). We then attempt to converge towars 0.23 by replacing ρ n by ρ n+1 = ρ n + n 1 (α n ), i.e. using a stochastic approximation algorithm [19, 2]. This ensures that if α n+1 > 0.23 then β will ecrease, while if α n+1 > 0.23 then β will increase, as it shoul. We shall use this iterative simulation approach to etermine the inverse temperatures {β j } in all of our simulation examples in Section 3 below Comparison with Geometric Temperature Spacing. In some cases, it is believe that the optimal choice of temperatures is geometric, i.e. that we want β j+1 = cβ j for appropriate constant c. Now, the notation of Theorem 1 correspons to setting β j = β + ɛ an β j+1 = β. So, it is optimal to have β j+1 = cβ j precisely when ɛ opt β in Theorem 1. Recall now that, from the en of the proof of Theorem 1, û l opt [I(β)/2] 1/2. = , an in particular û oes not epen on β. Then l opt = û[2/i(β)] 1/2. So, ɛ opt = l opt 1/2 = û 1/2 [2/I(β)] 1/2 I(β) 1/2. It follows that the conition ɛ opt β is equivalent to the conition I(β) 1/2 β, i.e. I(β) β 2. While this oes hol for some examples, e.g. the case when f(x) = e x r (see Section 2.4 below), it oes not hol for most other examples (e.g. for the Gamma istribution, various log x terms appear which o not lea to such a simple relationship). Thus, the optimal spacing of inverse temperatures is not necessarily geometric. In our simulations below, we consier both geometric spacings, an spacings foun using our 23% rule. We shall see that the 23% rule leas to superior performance A Simple Analytical Example. Consier the specific example where the target ensity is given by (5) with f(x) = e x r for some fixe r > 0. (This inclues the Gaussian (r = 2) an Double-Exponential (r = 1) cases.) For this example, f β (x) e β x r, an g(x) = x r. We then compute from (9) that g(x) f M(β) = E β β (x) x (g) = f β = x r e x r x (x) x e x r x
10 10 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL = 2 Γ(1/r) β (1+r)/r r 2 2 Γ(1 + 1 r ) β 1/r = 1/rβ where Γ(z) = 0 t z 1 e t t is the gamma function. Then by (11), I(β) Var β (g) = M (β) = 1/rβ 2 β 2. Hence, since I(β) β 2, it follows as above that the sprea of inverse-temperatures shoul be geometric, with β j+1 = cβ j for some constant c Relation to Previous Literature. The issue of temperature spacings an acceptance rates for MCMCMC has been wiely stuie in the physics literature [11, 12, 18, 6, 13, 5, 26, 27]. For example, the conclusion of Theorem 1, that the inverse temperatures shoul be chosen to make the swap acceptance probabilities all equal to the same value (here 0.234), is relate to the uniform exchange rates criterion escribe in [10], p In aition, simulation stuies [6] have suggeste tuning the temperatures so the swap acceptance rate is about 20%, an analytic stuies have suggeste [13] an optimal rate of 0.23 in some specific circumstances, as we now iscuss. In physics, the canonical ensemble correspons to writing f (β) (x) = e βv (x), where V is the energy function an 1/β is the temperature. Thus, in our notation, V (x) = g (x). Furthermore the average energy at inverse-temperature β is given by U(β) = V (x) e βv (x) x e βv (x) x. Physicists have approache the problem using the concepts of ensity of states an heat capacity. If the state space X is iscrete, the ensity of states is efine as the function G(u) := #{x X : V (x) = u}. If X is continuous, then G(u) is efine to be the ( 1)-imensional Lebesgue measure of the level set {x X : V (x) = u}. The heat capacity of the system at inverse-temperature β is efine as C(β) = β 2 U(β) β = β 2 Var β (V (X)). In our notation, Var β (V (X)) = Var β ( g (X)) = I(β), so C(β) = β 2 I(β). Hence, C(β) is constant if an only if I(β) β 2 as iscusse in Section 2.3. Previous literature has consiere, as have we, the iealise case in which exact sampling is one from each istribution f β. In this case, a swap in MCMCMC between β an β (β < β ) has acceptance probability [ ( )] A(β, β ) = E min 1, e (β β )(V (X) V (X )), with the expectation taken with respect to X f β an X f β. Assuming that the heat capacity is constant (i.e., C(β) C), an that the ensity of states has the form ( G(u) = 1 + β ) C 0 C (u u 0) G(u 0 ),
11 OPTIMAL SCALING OF MCMCMC 11 [11] shows that the average acceptance probability of a swap between β an β > β is given by: A(β, β ) = 2Γ(2C + 2) Γ(C + 1) 2 β/β 0 u C u (1 + u) 2(C+1) = 2 B(C + 1, C + 1) 1/(1+R) 0 θ C (1 θ) C θ, (13) where R = β /β > 1, Γ(x) is the Gamma function an B(a, b), the Beta function. Assuming a constant heat capacity an a more specific form for the ensity of state, G(u) = (2π)C+1 Γ(C + 1) uc, [18] erive the same formula as in (13) for A(β, β ), which they call the incomplete beta function law for parallel tempering. Letting C (an thus the imension of the system) go to infinity, [12, 18] use (13) to erive a limiting expression for the acceptance probability of MCMCMC similar to that obtaine in (12) above. [13] uses these limits to argue that 0.23 is approximately the optimal asymptotic swap acceptance rate using arguments somewhat similar to ours (inee, Figure 1 of [13] contains the same acceptance-versus-efficiency curve implie by (12)). Now, it seems that the heat capacity C(β) being constant is quite a restrictive assumption; it is satisfie for the example f(x) = e x r consiere in Section 2.4, but is usually not satisfie for more complicate functions. By contrast, our assumption (6) oes not impose any special conitions on the nature of the ensity function f(x). 3. Simulation examples 3.1. A Simple Gaussian Example. We illustrate Theorem 1 with the following simulation example. We implement the MCMCMC escribe above for Gaussian target an tempere istributions. Specifically, we consier just two inversetemperatures β 0 1 an β 1 with 0 < β 1 < 1. We efine f (x) an f (β) (x) as in (5) an (6), with f the ensity of a stanar normal istribution N(0, 1), so that f (β) = N(0, β 1 ). We use a version of the algorithm where the within-temperature moves are given by a Ranom Walk Metropolis (RWM) algorithm with Gaussian proposal istribution N(x, ( /β) I ) which is asymptotically optimal [22]. Our algorithm attemps 20 within-temperature moves for every one time it attempts a temperature swap. We ran this algorithm for each of 50 ifferent possible choices of β 1, each with 0 < β 1 < 1. For each such choice of β 1, we ran the algorithm for a total of 500,000 iterations to ensure goo averaging, an then estimate the acceptance probability as the fraction of propose swaps accepte, an the expecte square jump istance (ESJD) as the average of the square temperature jump istances (γ β 0 ) 2 (or, equivalently, as (β 0 β 1 ) 2 times the average square swap istance). Figure 3 plots the estimate acceptance probabilities versus square jump istances (SJD), for each of four ifferent imensions (10, 20, 50, an 100). We can see from these results that the swap acceptance probability is inee close to optimal, an it gets closer to optimal as the imension increases, an furthermore the relationship between
12 12 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL the two quantities is given approximately by (8) an Figure 2, just as Theorem 1 preicts. (a) (b) (c) () SJD SJD SJD SJD Acc. Prob. Acc. Prob. Acc. Prob. Acc. Prob. Figure 3: square jump istance (SJD) versus estimate Acceptance Probability for the Gaussian case f = N(0, 1) an f β = N(0, β 1 ), in imensions (a) = 10, (b) = 20, (c) = 50, an () = Inhomogeneous Gaussian Example. We now moify the previous example to a non-i.i.. case where f (x) = i=1 f i(x i ) with f i = N(0, i 2 ), so that f β i = N(0, i 2 β 1 ). (The inhomogeneity implies, in particular, that assumption (5) is no longer satisfie.) The rest of the etails remain exactly the same as for the previous example, except that the proposal istribution for the RWM within-temperature moves is now taken to be N (x, ( i 2 /β)i ) for the i th coorinate, which is optimal in this case. The resulting plots (for imensions = 10 an = 100) are given in Figure 4. Here again, an optimum value of β for the ESJD function emerges at a swap acceptance probability of approximately 23%. Again, the agreement with Theorem 1 an the relationship (8) an Figure 2 is striking. This is unsurprising given that this target istribution can be written as a collection of scalar linear transformations of Example 3.1, an Theorem 1 remains invariant through such transformations. (a) (b) SJD SJD Acc. Prob. Acc. Prob.
13 OPTIMAL SCALING OF MCMCMC 13 Figure 4: SJD versus Acceptance Probability for the inhomogeneous Gaussian case f i = N(0, i 2 ) an f β i = N(0, i 2 β 1 ), in imensions (a) = 10 an (b) = A Mixture Distribution Example. In this example an the next, we compare the two temperature scheuling strategies iscusse in Sections 2.2 an 2.3. One strategy consists of selecting the inverse-temperatures as a geometric sequence; we saw that this strategy is optimal when I(β) β 2, or equivalently the heat capacity is constant. The other strategy consists of choosing the temperatures such that the acceptance probability between any two ajacent temperatures is about 23%. We report here a simulation example comparing the two strategies. Consier the ensity f (x 1,..., x ) = f(x i ; ω i, µ i, σi 2 ) corresponing to a mixture of normal istributions, where = 20, an 3 f(x; ω, µ, σ 2 1 ) = ω j e 1 2σ 2 (x µ j ) 2 j, 2πσj j=1 with ω = (1/3, 1/3, 1/3), µ = ( 5, 0, 5), an σ = (0.2, 0.2, 0.2). For the 23% rule, we buil the temperatures sequentially as escribe in Section 2.2, with β = The number of chains is thus itself unknown initially, an turns out to be 9 as shown in the top line of Table 1. The geometric scheule uses that same number (9) of chains, geometrically space between β 0 1 an β (Thus, the geometric scheule is allowe to borrow the total number of chains from the 23% rule, which is in some sense overly generous to the geometric strategy.) Each chain was run for 200, 000 iterations. We report the square jump istances both in the temperature space an in the X - space. We observe that the 23% rule performs significantly better that the geometric scheuling, in terms of having larger average square jumping istances in both the β space an the X space rule Geometric spacing Table 1: Inverse-temperature scheules for the two ifferent temperature scheuling strategies for the mixture istribution example. i=1 E [ β n β n 1 2] E [ X n X n 1 2] 0.23 rule Geometric spacing Table 2: Square Jumping istances in β space an X space for the two ifferent temperature scheuling strategies for the mixture istribution example.
14 14 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL 3.4. Ising Distribution. We now compare the two scheuling strategies using the Ising istribution on the N N two-imensional lattice, given by π(x) = exp (E(x)) /Z, where Z is the normalizing constant an E(x) = J ( N i=1 N 1 j=1 x ij x i,j+1 + N 1 i=1 ) N x ij x i+1,j, (14) with x i {1, 1}. Obviously, this istribution oes not satisfy the assumption (5). For efiniteness, we choose N = 50 an J = The Ising istribution amits a phase transition at the critical temperature T c = 2J/ log(1 + 2) = , i.e. critical inverse-temperature β c = 0.979, aroun which the heat capacity unergoes stiff variation. Tempering techniques like MCMCMC an Simulate Tempering can perform poorly near critical temperatures because small changes in the temperature aroun T c result in rastic variations of the properties of the istribution. We expect the 23% rule to outperform the geometric scheule in this case. For our algorithm, for the within-temperature (i.e., X -space) moves, we use a Metropolis-Hastings algorithm with a proposal which consists of ranomly selecting a position (i, j) on the lattice an flipping its spin from x ij to x ij. We first etermine iteratively the inverse-temperature points using the algorithm escribe in Section 2.2. Then, given the lowest an highest temperatures an the number of temperature points etermine by this algorithm, we compute the geometric spacing. Figure 5 gives the selecte temperatures (not inverse-temperatures, i.e. it shows T j 1/β j ) from the two methos (the circles represent the 23% rule an the triangles represent the geometric spacing). j=1 Temperatures Figure 5: The temperatures {T j } selecte for the Ising moel (triangle: geometric spacing; circle: 23% rule). We note here that the 23% rule an the geometric spacing prouce very ifferent temperature points. Interestingly, in orer to maintain the right acceptance rate, the 23% rule puts more temperature points near the critical temperature (1.021); this is further illustrate in Table 3.
15 OPTIMAL SCALING OF MCMCMC 15 23% rule Geometric rule Table 3: Selecte temperature values near T c = for the Ising moel. Each version of MCMCMC was run for 5 million iterations. The average square jump istance in inverse-temperature space was for the 23% rule an for the geometric spacing, i.e. the 23% rule was almost twice as efficient. We also present the trace plots (subsample every 1,000 iterations) of the function {E(X n ), n 0} uring the simulation for each of the two algorithms in Figure 6, together with their autocorrelation functions. While the trace plots themselves appear similar, the autocorrelations inicate that the 23% rule has resulte in a faster mixing (less correlate) sampler in the X -space as well. We confirm this by calculating the empirical average square jump istance in the X space, SJD(X ) = n 1 n j=1 E(X j) E(X j 1 ) 2, which works out to for the 23% rule, an for the geometric sequence, i.e. the 23% rule is 1.6 times as efficient by this measure. (a): The geometric scheule (b): geometric scheule (c): The 23 % rule (): 23 % rule Figure 6: (a),(c) trace plots an (b),() autocorrelation functions of E(X n ) for the Ising moel, subsample every 1,000 iterations, with (a),(b) geometric temperature spacing, an (c),() the 23% rule.
16 16 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL 4. Simulate Tempering We now consier the Simulate Tempering algorithm [16, 7]. This is somewhat similar to MCMCMC, except now there is just one particle which can jump either within-temperature or between-temperatures. Again we focus on the betweentemperatures move, an keep similar notation to before. The state space is now given by {β 0, β 1,..., β n } X an the target ensity is proportional to f (β, x) = i=1 ek(β) f β (x i ) = e K(β) f (β) (x) for some choice of normalizing constants K(β). A propose temperature move from (β, x) to (β + ɛ, x) is accepte with probability ( ) e K(β+ɛ) f β+ɛ (x) α = 1 e K(β) f β (x). We again make the simplifying assumptions (5) an (6), an again let ɛ 0 accoring to (7). We further assume that we have chosen the true normalising constants, ( ) K(β) = log f β (x)x, (15) so that f (β, ) is inee a normalise ensity function. This choice makes the stationary istribution for β be uniform on {β 0, β 1,..., β n }; this appears to be optimal since it allows the β values to most easily an quickly migrate from the col temperature β 0 to the hot temperature β n an then back to β 0 again, thus maximising the influence of the hot temperature on the mixing of the col chain Main Result. Uner the above assumptions, we prove the following analog of Theorem 1: Theorem 2. Consier the Simulate Tempering algorithm as escribe above, assuming (5), (6), (7), an (15). Then as, the ESJD of (2) is maximize when l is chosen to maximize l 2 2 Φ ( li(β) 1/2 /2 ). Furthermore, for this optimal choice of l, the corresponing probability of accepting a propose swap is given (to three ecimal points) by In fact, the relationship between ESJD of (2) an ACC of (3) is given by ESJD = (4/I(β)) ACC ( Φ 1 (ACC/2) ) 2 (16) (see Figure 7). Finally, this choice of l also maximises the ESJD(h) of (4), for any ifferentiable function h : [0, 1] R.
17 OPTIMAL SCALING OF MCMCMC 17 Efficiency ESJD Acceptance Rate ACC Figure 7: A graph of the relationship between the expecte square jumping istance (ESJD) an asymptotic acceptance rate (ACC), as escribe in equations (16) (top, green) an (8) (bottom, blue), in units where I(β) = 1. Proof. A propose temperature move from (β, x) to (β + ɛ, x) is accepte with probability ( ) e K(β+ɛ) f β+ɛ (x) α = 1 e K(β) f β (x) = 1 e (K(β+ɛ) K(β)) e ɛ P i=1 g(x i) = 1 e (K(β+ɛ) K(β))+ɛM(β) e ɛ P i=1 ḡ(x i), where ḡ(x) = g(x) M(β). ( ) If we choose K(β) = log f β (x)x, then we compute (again using the quotient rule ) by comparison with (9) that K (β) = M(β). Hence, from (11), K (β) = M (β) = I(β). Therefore, by a Taylor series expansion, K(β + ɛ) K(β) = ɛ K (β) ɛ2 K (β) = ɛm(β) 1 2 ɛ2 I(β) for some β ɛ [β, β + ɛ]. It follows that (K(β + ɛ) K(β))+ɛM(β) = ɛ 2 K (β ɛ )/2 = (l 2 /2)K (β ɛ ) = (l 2 /2)I(β ɛ ). Setting ɛ = 1/2 l for some l > 0 an letting, it follows from the above an the central limit theorem that lim 1 ( e K(β+ɛ) f β+ɛ e K(β) f β (x) = lim 1 ) (x) ( = lim 1 e (l2 /2)I(β ɛ) e ɛ P ) i=1 ḡ(x i) ( e (l2 /2)I(β) e l 1/2 P i=1 ḡ(x i) ) 1 e A, where A N( (l 2 /2)I(β), l 2 Var β (g)) = N( (l 2 /2)I(β), l 2 I(β)). The rest of the argument is stanar, just as in Theorem 1, an shows that for large, the square jump istance of Simulate Tempering is approximately ESJD = (l 2 /2) 2 Φ( li(β) 1/2 /2) e ST (l), (17)
18 18 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL which is again maximise by choosing l such that the acceptance probability is (to three ecimal places) equal to The argument for (16) is ientical to that for Theorem Comparison of Simulate Tempering an MCMCMC. Now that we have optimality results for both Simulate Tempering (Theorem 2) an MCMCMC (Theorem 1), it is natural to compare them. We have the following. Corollary 1. Uner assumptions (5), (6), (7), an (15), asymptotically as, the maximal value of ESJD for Simulate Tempering is precisely twice that for MCMCMC (cf. Figure 7). (More generally, the maximal ESJD(h) for Simulate Tempering is precisely twice that for MCMCMC, for any ifferentiable h). Furthermore, the optimal choice of l for Simulate Tempering is 2 times that for MCMCMC. Proof. Comparing e MC (l) from (12) with e ST (l) from (17), we see that e MC (l) = 1 2 e ST (l 2). In particular, sup l e MC (l) = 1 2 sup l e ST (l), an also argsup l e MC (l) = (1/ 2) argsup l e ST (l), which gives the result. Corollary 1 says that, uner appropriate conitions, the ESJD of MCMCMC is precisely half that of Simulate Tempering. That is, if Simulate Tempering has ieally-chosen normalisation constants K(β) as in (15), then in some sense it is twice as efficient as MCMCMC. However, choosing K(β) ieally may be ifficult or impossible (though it may be possible in certain cases to learn these weights aaptively uring the simulation, see e.g. [4]), an for non-optimal K(β) this comparison no longer applies. By contrast, MCMCMC oes not require K(β) at all. Furthermore, a single temperature swap of MCMCMC upates two ifferent chain values, an thus may be twice as valuable as a single temperature move uner Simulate Tempering. If so, then the two ifferent factors of two precisely cancel each other out. In aition, it may take more computation to o one within-temperature step of MCMCMC (involving n + 1 particles) than one step of Simulate Tempering (involving just one particle). In summary, this comparison of the two algorithms involves various subtleties an is not entirely clear-cut, though Corollary 1 still provies certain insights into the relationship between them Langevin Dynamics for Simulate Tempering? One limitation of Simulate Tempering is that it performs a Ranom Walk Metropolis (RWM) algorithm in the β-space, where each proposal bears a high chance of being rejecte if the value of the energy i=1 g(x i) is not consistent with the propose value of temperature. Now, it is known [21] that Langevin ynamics (which use erivatives to improve the proposal istribution) are significantly more efficient than RWM when they can be applie. In the context of the inverse temperatures, a Langevin-style proposal istribution coul take the form: [ ( ) ] N β + σ2 g(x i ) K(β), σ 2. 2 i=1
19 OPTIMAL SCALING OF MCMCMC 19 Furthermore, in this case K(β) = E β (g(x)). Therefore, such a Langevin proposal will compare the current value of i=1 g(x i) to the average value K(β). If i=1 g(x i) K(β) then smaller temperature are more compatible an are more likely to be propose (an vice versa). The main limitation of this iea is that in practice, we o not know the graient K(β). This can perhaps be estimate uring the simulation as the average of the energy j=1 g(x n) at times n where the inverse-temperature level β is visite, but this nees more investigation an we o not pursue it here. 5. Discussion an Future Work This paper has presente certain results about optimal inverse-temperature spacings. In particular, we have prove that for MCMCMC (Theorem 1) an Simulate Tempering (Theorem 2), it is optimal (uner certain conitions) to space the inversetemperatures so that the probability of accepting a temperature swap or move is approximately 23%. Our theorems were prove uner the restrictive assumption (5), but we have seen in simulations (Section 3) that they continue to approximately apply even if this assumption is violate. Theorems 1 an 2 were state in terms of the expecte square jumping istances (ESJD) of the inverse-temperatures, assuming the chain begins in stationarity. While this provies useful information about optimising the algorithm, it provies less information about the algorithm s long-term behaviour. In this section, we consier various relate issues an possible future research irections Inverse-Temperature Process. It is possible to efine an entire inverse temperature process, {y n } n=0, living on the state space Y = {β 0, β 1,..., β n }. For Simulate Tempering (ST), this process is simply the inverse-temperature β at each iteration. For MCMCMC, this process involves tracking the influence of a particular inverse-temperature s chain through the temperature swaps, i.e. if y n = β j, an then the β j an β j+1 chains successfully swap, then y n+1 = β j+1 (otherwise y n+1 = β j ). In general, this {y n } process will not be Markovian. However, if we assume that the ST/MCMCMC algorithm oes an automatic i.i.. reset into the istribution f (β) immeiately upon entering the inverse-temperature β (i.e., where the chain values associate with each inverse-temperature β are taken as i.i.. raws from f (β) at each iteration), then {y n } becomes a Markov chain. Inee, uner this assumption, it correspons to a simple ranom walk (with holing probabilities) on Y. In that case, it seems likely that as, if we appropriately shrink space by a factor of an spee up time by a factor of, then as in the RWM case [20, 22], the {y n } will converge to a limiting Langevin iffusion process. In that case, maximising the spee of the limiting iffusion is equivalent to optimising the original algorithm (accoring to any optimisation measure, see [22]), an this woul provie another justification of the values of l opt in Theorems 1 an 2. In terms of this {y n } Markov chain, ESJD from (2) is simply the expecte square jumping istance of this process. Furthermore, ESJD(h) from (4) is then the Dirichlet form corresponing to the Markov operator for {y n }. Thus, Theorems 1 an 2
20 20 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL woul also be saying that the given values of l opt maximise the associate Dirichlet form. Of course, this reset assumption that the moves within each inverse temperature β j result in an immeiate i.i.. reset to the ensity f β j is not realistic, since in true applications the convergence within each fixe temperature chain woul occur only graually (especially for larger β j ). However, it is not entirely unrealistic either, for a number of reasons. For example, some algorithms might run many within-temperature moves for each single attempte temperature swap, thus making the within-temperature chains effectively mix much faster. Also, some withintemperature algorithms (e.g. Langevin iffusions, see [21]) have convergence time of smaller orer (O( 1/3 )) than that of the temperature-swapping ranom-walk behaviour (O()), so effectively the within-temperature chains converge immeiately on a relative scale, which is equivalent to the reset assumption. Proving convergence to limiting iffusions can be rather technical (see e.g. [20, 21]), so we o not pursue it here, but rather leave it for future work. In any case, assuming the limiting iffusion oes hol (as it probably oes), this provies new insight an context for the optimal behaviour of MCMCMC an ST algorithms Joint Diffusion Limits? If we o not assume the reset assumption as above, then the process {y n } is not Markovian, so a iffusion limit seems far less certain. However, it is still possible to jointly consier the chain {x n } n=0 living on X (for ST) or X (for MCMCMC), together with the inverse temperature process {y n }. The joint process (x n, y n ) must be Markovian, an it is quite possible that it has its own joint iffusion limit an joint optimal scalings. This woul become quite technical, an we o not pursue it here, but we consier it an interesting topic for future research Moels For How the Algorithms Converge. Relate to the above is the question of a eeper unerstaning of how MCMCMC an ST make use of the tempere istributions to improve convergence. Significant progress was mae in this irection in previous research (e.g. [26]), but more remains to be one. One possible simplifie moel is to assume the process oes not move at all in the within-temperature irection, except at the single inverse-temperature β = β when it oes an immeiate i.i.. reset to f ( β). At first we thought that such a moel is insufficient since it woul then be exponentially ifficult for the algorithm to climb from β = β to β = β 0, but the work of [26] shows that this is in fact not the case. So, we may explore this moel further in future work. A compromise moel is where the state space X is partitione into a finite number of moes, an when entering any β > 0 the process oes a reset into f (β) conitional on remaining in the current moe. Such a process assumes fast within-moe mixing at all temperatures, but between-moe mixing only at the hot temperature when β = 0. We believe that in this case, there is a joint iffusion limit of the joint (β,moe) process. If so, this woul be interesting an provie a goo moel for how the hot temperature an intermeiate temperature mixing all works together State space invariance. Our results o not epen in any way on the upate mechanism within the state space X. Moreover, although our results are prove
21 OPTIMAL SCALING OF MCMCMC 21 for the case of IID components, this observation immeiately generalises our finings to the very large class of istributions which we can write as functions of IID components. In fact the only thing which stops this being a completely general imensional state space result is the fact that ST on the transforme variables is a ifferent metho from the transforme ST algorithm. From a practical point of view, this might even suggest improve tempering schemes (rather than simply consiering powere ensities) Implementation. Fining practical ways to implement MCMCMC an ST in light of our results has not been a focus of this paper, though this is an important issue for future consieration. The approach aopte in Subsection 2.2 is esigne to be a thorough methoology to investigate the effects of Theorem 1, but cannot be consiere a practical approach in general. Intriguing is the propect of integrating our theory within an aaptive MCMC framework as in for example [3, 1, 24, 23] Parallelisation. As computational power become more an more wiesprea, there is increasing interest in running Monte Carlo algorithms in parallel on many machines, perhaps even using Graphics Processing Units (GPUs), see e.g. [8, 25, 14]. It woul be interesting to consier, even in simple situations, how to optimise parallel computations asymptotically as the number of processors goes to infinity. Acknowlegements: We thank the eitors an anonymous referees for very helpful reports that significantly improve the presentation of our results. References [1] C. Anrieu an E. Moulines (2007) On the ergoicity properties of some Markov chain Monte Carlo algorithms. Ann. Appl. Prob. 44, 2, [2] C. Anrieu an C.P. Robert (2001), Controlle MCMC for optimal sampling. Preprint. [3] Y.F. Atchae, An aaptive version of the Metropolis ajuste Langevin algorithm with a truncate rift, Meth. an Comp. in Appl. Prob., 2006, 8, 2, [4] Y.F. Atchae an J.S. Liu (2004), The Wang-Lanau algorithm in general state spaces: applications an convergence analysis. Statistica Sinica, to appear. [5] B. Cooke an S.C. Schmiler (2008), Preserving the Boltzmann ensemble in replica-exchange molecular ynamics. J. Chem. Phys. 129, [6] D.J. Earl an M.W. Deem (2005), Parallel tempering: theory, applications, an new perspectives. J. Phys. Chem. B 108, [7] C.J. Geyer (1991), Markov chain Monte Carlo maximum likelihoo. In Computing Science an Statistics: Proceeings of the 23r Symposium on the Interface, [8] P.W. Glynn an P. Heielberger (1991), Analysis of Parallel, Replicate Simulations uner a Completion Time Constraint. ACM Trans. on Moeling an Simulation 1, [9] W.K. Hastings (1970), Monte Carlo sampling methos using Markov chains an their applications. Biometrika 57, [10] Y. Iba (2001), Extene ensemble Monte Carlo. Int. J. Mo. Phys. C 12(5), [11] D.A. Kofke (2002), On the acceptance probability of replica-exchange Monte Carlo trials. J. Chem. Phys. 117, Erratum: J. Chem. Phys. 120, [12] D.A. Kofke (2004), Comment on reference [18]. J. Chem. Phys. 121, 1167.
Introduction. A Dirichlet Form approach to MCMC Optimal Scaling. MCMC idea
Introuction A Dirichlet Form approach to MCMC Optimal Scaling Markov chain Monte Carlo (MCMC quotes: Metropolis et al. (1953, running coe on the Los Alamos MANIAC: a feasible approach to statistical mechanics
More informationConvergence of Random Walks
Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of
More informationDifferentiation ( , 9.5)
Chapter 2 Differentiation (8.1 8.3, 9.5) 2.1 Rate of Change (8.2.1 5) Recall that the equation of a straight line can be written as y = mx + c, where m is the slope or graient of the line, an c is the
More informationMath 342 Partial Differential Equations «Viktor Grigoryan
Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite
More informationJUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson
JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises
More informationMath Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors
Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+
More informationThe derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)
Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)
More informationIntroduction to the Vlasov-Poisson system
Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its
More informationTable of Common Derivatives By David Abraham
Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec
More informationSurvey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More informationA Review of Multiple Try MCMC algorithms for Signal Processing
A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications
More informationTime-of-Arrival Estimation in Non-Line-Of-Sight Environments
2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationOptimization of Geometries by Energy Minimization
Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.
More informationinflow outflow Part I. Regular tasks for MAE598/494 Task 1
MAE 494/598, Fall 2016 Project #1 (Regular tasks = 20 points) Har copy of report is ue at the start of class on the ue ate. The rules on collaboration will be release separately. Please always follow the
More informationThe total derivative. Chapter Lagrangian and Eulerian approaches
Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function
More informationConstruction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems
Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu
More informationarxiv:math/ v1 [math.pr] 3 Jul 2006
The Annals of Applie Probability 006, Vol. 6, No., 475 55 DOI: 0.4/050560500000079 c Institute of Mathematical Statistics, 006 arxiv:math/0607054v [math.pr] 3 Jul 006 OPTIMAL SCALING FOR PARTIALLY UPDATING
More informationOn Characterizing the Delay-Performance of Wireless Scheduling Algorithms
On Characterizing the Delay-Performance of Wireless Scheuling Algorithms Xiaojun Lin Center for Wireless Systems an Applications School of Electrical an Computer Engineering, Purue University West Lafayette,
More informationTMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments
Problem F U L W D g m 3 2 s 2 0 0 0 0 2 kg 0 0 0 0 0 0 Table : Dimension matrix TMA 495 Matematisk moellering Exam Tuesay December 6, 2008 09:00 3:00 Problems an solution with aitional comments The necessary
More informationFinal Exam Study Guide and Practice Problems Solutions
Final Exam Stuy Guie an Practice Problems Solutions Note: These problems are just some of the types of problems that might appear on the exam. However, to fully prepare for the exam, in aition to making
More informationSeparation of Variables
Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical
More information1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.
Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.
More informationarxiv: v2 [cond-mat.stat-mech] 11 Nov 2016
Noname manuscript No. (will be inserte by the eitor) Scaling properties of the number of ranom sequential asorption iterations neee to generate saturate ranom packing arxiv:607.06668v2 [con-mat.stat-mech]
More informationMonotonicity for excited random walk in high dimensions
Monotonicity for excite ranom walk in high imensions Remco van er Hofsta Mark Holmes March, 2009 Abstract We prove that the rift θ, β) for excite ranom walk in imension is monotone in the excitement parameter
More informationSolutions to Practice Problems Tuesday, October 28, 2008
Solutions to Practice Problems Tuesay, October 28, 2008 1. The graph of the function f is shown below. Figure 1: The graph of f(x) What is x 1 + f(x)? What is x 1 f(x)? An oes x 1 f(x) exist? If so, what
More informationLecture 6: Calculus. In Song Kim. September 7, 2011
Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear
More informationDifferentiability, Computing Derivatives, Trig Review. Goals:
Secants vs. Derivatives - Unit #3 : Goals: Differentiability, Computing Derivatives, Trig Review Determine when a function is ifferentiable at a point Relate the erivative graph to the the graph of an
More informationCalculus and optimization
Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function
More informationTopic 7: Convergence of Random Variables
Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information
More informationDifferentiability, Computing Derivatives, Trig Review
Unit #3 : Differentiability, Computing Derivatives, Trig Review Goals: Determine when a function is ifferentiable at a point Relate the erivative graph to the the graph of an original function Compute
More informationA Sketch of Menshikov s Theorem
A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p
More informationMath 1B, lecture 8: Integration by parts
Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores
More informationThe Exact Form and General Integrating Factors
7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationAn Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an
More informationThermal conductivity of graded composites: Numerical simulations and an effective medium approximation
JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University
More informationELEC3114 Control Systems 1
ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.
More informationu!i = a T u = 0. Then S satisfies
Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace
More information6 General properties of an autonomous system of two first order ODE
6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x
More informationensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y
Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay
More informationAnalytic Scaling Formulas for Crossed Laser Acceleration in Vacuum
October 6, 4 ARDB Note Analytic Scaling Formulas for Crosse Laser Acceleration in Vacuum Robert J. Noble Stanfor Linear Accelerator Center, Stanfor University 575 San Hill Roa, Menlo Park, California 945
More informationLinear First-Order Equations
5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)
More informationA note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz
A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae
More information05 The Continuum Limit and the Wave Equation
Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,
More informationCalculus in the AP Physics C Course The Derivative
Limits an Derivatives Calculus in the AP Physics C Course The Derivative In physics, the ieas of the rate change of a quantity (along with the slope of a tangent line) an the area uner a curve are essential.
More informationCalculus of Variations
16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t
More informationA Second Time Dimension, Hidden in Plain Sight
A Secon Time Dimension, Hien in Plain Sight Brett A Collins. In this paper I postulate the existence of a secon time imension, making five imensions, three space imensions an two time imensions. I will
More informationBalancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling
Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January
More informationSection 2.7 Derivatives of powers of functions
Section 2.7 Derivatives of powers of functions (3/19/08) Overview: In this section we iscuss the Chain Rule formula for the erivatives of composite functions that are forme by taking powers of other functions.
More informationMake graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides
Reference 1: Transformations of Graphs an En Behavior of Polynomial Graphs Transformations of graphs aitive constant constant on the outsie g(x) = + c Make graph of g by aing c to the y-values on the graph
More informationPermanent vs. Determinant
Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an
More informationLecture 2 Lagrangian formulation of classical mechanics Mechanics
Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,
More informationSchrödinger s equation.
Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of
More informationGeneralization of the persistent random walk to dimensions greater than 1
PHYSICAL REVIEW E VOLUME 58, NUMBER 6 DECEMBER 1998 Generalization of the persistent ranom walk to imensions greater than 1 Marián Boguñá, Josep M. Porrà, an Jaume Masoliver Departament e Física Fonamental,
More informationExpected Value of Partial Perfect Information
Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo
More informationQuantum Mechanics in Three Dimensions
Physics 342 Lecture 20 Quantum Mechanics in Three Dimensions Lecture 20 Physics 342 Quantum Mechanics I Monay, March 24th, 2008 We begin our spherical solutions with the simplest possible case zero potential.
More informationJointly continuous distributions and the multivariate Normal
Jointly continuous istributions an the multivariate Normal Márton alázs an álint Tóth October 3, 04 This little write-up is part of important founations of probability that were left out of the unit Probability
More informationLower Bounds for the Smoothed Number of Pareto optimal Solutions
Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.
More informationSYDE 112, LECTURE 1: Review & Antidifferentiation
SYDE 112, LECTURE 1: Review & Antiifferentiation 1 Course Information For a etaile breakown of the course content an available resources, see the Course Outline. Other relevant information for this section
More information18 EVEN MORE CALCULUS
8 EVEN MORE CALCULUS Chapter 8 Even More Calculus Objectives After stuing this chapter you shoul be able to ifferentiate an integrate basic trigonometric functions; unerstan how to calculate rates of change;
More informationLATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION
The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische
More informationLagrangian and Hamiltonian Mechanics
Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical
More information3.2 Differentiability
Section 3 Differentiability 09 3 Differentiability What you will learn about How f (a) Might Fail to Eist Differentiability Implies Local Linearity Numerical Derivatives on a Calculator Differentiability
More informationNOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,
NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which
More informationEntanglement is not very useful for estimating multiple phases
PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The
More informationA Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential
Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix
More informationCascaded redundancy reduction
Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,
More informationSome Examples. Uniform motion. Poisson processes on the real line
Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]
More informationChapter 4. Electrostatics of Macroscopic Media
Chapter 4. Electrostatics of Macroscopic Meia 4.1 Multipole Expansion Approximate potentials at large istances 3 x' x' (x') x x' x x Fig 4.1 We consier the potential in the far-fiel region (see Fig. 4.1
More informationELECTRON DIFFRACTION
ELECTRON DIFFRACTION Electrons : wave or quanta? Measurement of wavelength an momentum of electrons. Introuction Electrons isplay both wave an particle properties. What is the relationship between the
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationd dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1
Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of
More informationA PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationIntroduction to Markov Processes
Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav
More informationQF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim
QF101: Quantitative Finance September 5, 2017 Week 3: Derivatives Facilitator: Christopher Ting AY 2017/2018 I recoil with ismay an horror at this lamentable plague of functions which o not have erivatives.
More informationConservation Laws. Chapter Conservation of Energy
20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action
More informationImplicit Differentiation
Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,
More informationLower bounds on Locality Sensitive Hashing
Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,
More informationAgmon Kolmogorov Inequalities on l 2 (Z d )
Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,
More informationLectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs
Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent
More informationBohr Model of the Hydrogen Atom
Class 2 page 1 Bohr Moel of the Hyrogen Atom The Bohr Moel of the hyrogen atom assumes that the atom consists of one electron orbiting a positively charge nucleus. Although it oes NOT o a goo job of escribing
More informationSharp Thresholds. Zachary Hamaker. March 15, 2010
Sharp Threshols Zachary Hamaker March 15, 2010 Abstract The Kolmogorov Zero-One law states that for tail events on infinite-imensional probability spaces, the probability must be either zero or one. Behavior
More informationSwitching Time Optimization in Discretized Hybrid Dynamical Systems
Switching Time Optimization in Discretize Hybri Dynamical Systems Kathrin Flaßkamp, To Murphey, an Sina Ober-Blöbaum Abstract Switching time optimization (STO) arises in systems that have a finite set
More informationLecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations
Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:
More informationUniversity of Toronto Department of Statistics
Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704
More informationAcute sets in Euclidean spaces
Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of
More informationMarkov Chains in Continuous Time
Chapter 23 Markov Chains in Continuous Time Previously we looke at Markov chains, where the transitions betweenstatesoccurreatspecifietime- steps. That it, we mae time (a continuous variable) avance in
More informationVectors in two dimensions
Vectors in two imensions Until now, we have been working in one imension only The main reason for this is to become familiar with the main physical ieas like Newton s secon law, without the aitional complication
More information1 Math 285 Homework Problem List for S2016
1 Math 85 Homework Problem List for S016 Note: solutions to Lawler Problems will appear after all of the Lecture Note Solutions. 1.1 Homework 1. Due Friay, April 8, 016 Look at from lecture note exercises:
More informationQubit channels that achieve capacity with two states
Qubit channels that achieve capacity with two states Dominic W. Berry Department of Physics, The University of Queenslan, Brisbane, Queenslan 4072, Australia Receive 22 December 2004; publishe 22 March
More informationMulti-View Clustering via Canonical Correlation Analysis
Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in
More informationSituation awareness of power system based on static voltage security region
The 6th International Conference on Renewable Power Generation (RPG) 19 20 October 2017 Situation awareness of power system base on static voltage security region Fei Xiao, Zi-Qing Jiang, Qian Ai, Ran
More informationRobust Low Rank Kernel Embeddings of Multivariate Distributions
Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions
More informationarxiv: v4 [math.pr] 27 Jul 2016
The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,
More information26.1 Metropolis method
CS880: Approximations Algorithms Scribe: Dave Anrzejewski Lecturer: Shuchi Chawla Topic: Metropolis metho, volume estimation Date: 4/26/07 The previous lecture iscusse they some of the key concepts of
More informationNon-Linear Bayesian CBRN Source Term Estimation
Non-Linear Bayesian CBRN Source Term Estimation Peter Robins Hazar Assessment, Simulation an Preiction Group Dstl Porton Down, UK. probins@stl.gov.uk Paul Thomas Hazar Assessment, Simulation an Preiction
More informationMath 210 Midterm #1 Review
Math 20 Miterm # Review This ocument is intene to be a rough outline of what you are expecte to have learne an retaine from this course to be prepare for the first miterm. : Functions Definition: A function
More information