YVES F. ATCHADÉ, GARETH O. ROBERTS, AND JEFFREY S. ROSENTHAL

Size: px
Start display at page:

Download "YVES F. ATCHADÉ, GARETH O. ROBERTS, AND JEFFREY S. ROSENTHAL"

Transcription

1 TOWARDS OPTIMAL SCALING OF METROPOLIS-COUPLED MARKOV CHAIN MONTE CARLO YVES F. ATCHADÉ, GARETH O. ROBERTS, AND JEFFREY S. ROSENTHAL Abstract. We consier optimal temperature spacings for Metropolis-couple Markov chain Monte Carlo (MCMCMC) an Simulate Tempering algorithms. We prove that, uner certain conitions, it is optimal (in terms of maximising the expecte square jumping istance) to space the temperatures so that the proportion of temperature swaps which are accepte is approximately This generalises relate work by physicists, an is consistent with previous work about optimal scaling of ranom-walk Metropolis algorithms. 1. Introuction The Metropolis-couple Markov chain Monte Carlo (MCMCMC) algorithm [7], also known as parallel tempering or the replica exchange metho, is a version of the Metropolis-Hastings algorithm [17, 9] which is very effective in ealing with problems of multi-moality. The algorithm works by simulating multiple copies of the target (stationary) istribution, each at a ifferent temperature. Through a swap move, the algorithm allows copies at lower temperatures to borrow information from high-temperature copies an thus escape from local moes. The performance of MCMCMC epens crucially on the temperatures use for the ifferent copies. There is an ongoing iscussion (especially in the Physics literature) on how to best select these temperatures. In this note, we show that this question can be partially aresse using the optimal scaling framework initiate in [20]. MCMCMC arises as follows. We are intereste in sampling from a target probability istribution π( ) having (complicate, probably multimoal, an possibly un-normalise) target ensity f (x) on some state space X, which is an open subset of R for some (large) imension. We efine a sequence of associate tempere istributions f (β j) (x), where 0 β n < β n 1 <... < β 1 < β 0 = 1 are the selecte inverse temperature values, subject to the restriction that f (β 0) (x) = f (1) (x) = f (x). (Usually the ensities f (β j) are simply powers of the original ensity, i.e. f (β j) (x) = (f (x)) β j ; in this case, if X has infinite volume, then we require β n > 0.) Date: July 30, 2009; last revise April Mathematics Subject Classification. 60J10, 65C05. Key wors an phrases. Metropolis-Couple MCMC, Simulate tempering, Optimal scaling. Y. Atchaé: Department of Statistics, University of Michigan, 1085 South University, Ann Arbor, 48109, MI, Unite States. aress: yvesa@umich.eu. G. O. Roberts: Department of Statistics, University of Warwick, Coventry, CV4 7AL UK. aress: gareth.o.roberts@warwick.ac.uk. J. S. Rosenthal: Department of Statistics, University of Toronto, Toronto, Ontario, Canaa, M5S 3G3. aress: jeff@math.toronto.eu. Partially supporte by NSERC of Canaa. 1

2 2 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL MCMCMC procees by running one chain at each of the n + 1 values of β. It has state space X n+1, with unnormalise stationary measure n j=0 f (β j) (x j ), where each x j correspons to a chain at the fixe inverse temperature β j having stationary ensity f (β j) (x j ). In particular, the β 0 chain has stationary ensity f (β 0) (x 0 ) = f (x 0 ), so that after a long run the samples x 0 shoul approximately correspon to the ensity f (x) which is the ensity of actual interest. The hope is that the chains corresponing to smaller β (i.e., to higher temperatures) can mix more easily, an then can len this mixing information to the β 0 chain, thus speeing up convergence of the x 0 values to the ensity f (x). To achieve this, MCMCMC alternates between two ifferent types of ynamics. On some iterations, it attempts a within temperature move, by upating each x j accoring to some type of MCMC upate (e.g., a usual ranom-walk Metropolis algorithm) for which f (β j) is a stationary ensity. On other iterations, it attempts a temperature swap, which consists of choosing two ifferent inverse temperatures, say β j an β k, an then proposing to swap their respective chain values, i.e. to interchange the current values of x j an x k. This propose swap is then accepte accoring to the usual (symmetric) Metropolis algorithm probabilities, i.e. it is accepte with probability ( min 1, f (β j) (x k ) f (β k) (x j ) ), (1) f (β j) (x j ) f (β k) (x k ) otherwise it is rejecte an the values of x are left unchange. Such algorithms lea to many interesting questions an have been wiely stuie, see e.g. [7, 11, 12, 18, 6, 13, 15, 5, 26, 27]. This note will concentrate on the specific question of optimising the choice of the β values to achieve maximal efficiency in the temperature swaps. Specifically, suppose we wish to esign the algorithm such that the chain at inverse temperature β will propose to swap with another chain at inverse temperature β + ɛ. What choice of ɛ is best? Obviously, if ɛ is very large, then such swaps will usually be rejecte. Similarly, if ɛ is very small, then such swaps will usually be accepte, but will not greatly improve mixing. Hence, the optimal ɛ is somewhere between the two extremes (this is sometimes calle the Golilocks Principle ), an our task is to ientify it. Our main result (Theorem 1) will show that, in a certain limiting sense uner certain assumptions, it is optimal (in terms of maximising the expecte square jumping istance of the influence of hotter chains on coler ones) to choose the spacing ɛ such that the probability of such a swap being accepte is equal to This is the same optimal acceptance probability erive previously for certain ranomwalk Metropolis algorithms [20, 22], an generalises some results in the physics literature [11, 18]. It also has connections (Section 5.1) to the Dirichlet form of an associate temperature process. We shall also consier (Section 4) the relate Simulate Tempering algorithm, an shall prove that uner certain conitions the optimal acceptance rate applies there as well. We shall also compare (Corollary 1) Simulate Tempering to MCMCMC, an see that in a certain sense the former is exactly twice as efficient as the latter, but there are various mitigating factors so the comparison is far from clear-cut.

3 OPTIMAL SCALING OF MCMCMC Toy Example. To illustrate the importance of temperature spacings, an of measuring the influence of hotter chains on coler ones, we consier a very simple toy example. Suppose the state space consists of just 101 iscrete points, X = {0, 1, 2, 3,..., 100}, with (un-normalise) target ensity (with respect to counting measure on X ) given by π{x} = 2 x + 2 (100 x) for x X. That is, π has two moes, at 0 an 100, with a virtually insurmountable barrier of low probability states between them (Figure 1, top-left). Let the tempere ensities be given (as usual) ( β. by powers of π, i.e. f (β) (x) = π(x)) Thus, for very small β, the insurmountable barrier is nicely remove (Figure 1, top-mile). Can we make use of such tempere ensities to improve convergence of the original chain to π? To be specific, suppose each fixe-β chain is a ranom-walk Metropolis algorithm with proposal kernel q(x, x + 1) = q(x, x 1) = 1/2. Thus, the col (large β) chains are virtually incapable of moving from state 0 to state 100 or back (Figure 1, topleft), but the hot (small β) chains have no such obstacle (Figure 1, top-mile). The question is, what values of the inverse temperatures β will best allow an MCMCMC algorithm to benefit from the rapi mixing of the hot chains, to provie goo mixing for the col chain? Figure 1: The toy example s target ensity (top-left), tempere ensity at inverse-temperature β = (top-mile), ranom-walk Metropolis run (top-right), an trace plots of the col chain over 10,000 full iterations of MCMCMC runs with two temperatures (bottom-left), ten temperatures (bottom-mile), an fifty temperatures (bottom-right).

4 4 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL To test this, we ran an MCMCMC algorithm for 10,000 full iterations (each consisting of one upate at each inverse-temperature, plus one attempte swap of ajacent inverse-temperature values), in each of four settings: with just one inversetemperature, i.e. an orinary MCMC algorithm with no temperature swaps (Figure 1, top-right); with two inverse-temperatures, 1 an (Figure 1, bottom-left); with ten inverse-temperatures, j/9 for j = 0, 1, 2,..., 9 (Figure 1, bottommile); an with fifty inverse-temperatures, j/49 for j = 0, 1, 2,..., 49 (Figure 1, bottom-mile). We starte each run with all chains at the state 0, to investigate the extent to which they were able to mix well an effectively sample from the equally-large moe at state 100. The results of our runs were that the orinary MCMC algorithm with no temperature swaps i not traverse the barrier at all, an just staye near the state 0 (Figure 1, top-right). Aing one aitional temperature improve this somewhat (Figure 1, bottom-left), an the col chain was now able to sometimes sample from states near 100, but only occasionally. Using ten temperatures improve this greatly an le to quite goo mixing of the col chain (Figure 1, bottom-mile). Perhaps most interestingly, using lots more temperatures (fifty) i not improve mixing further, but actually mae it much worse (Figure 1, bottom-right), espite the greatly increase computation time (since each temperature requires its own chain values to be upate at each iteration). So, we see that in this example, ten geometrically-space temperatures performe much better than two temperatures (too few) or fifty temperatures (too many). Intuitively, with only two temperatures, it is too ifficult for the algorithm to effectively swap values between ajacent temperatures (inee, only about 5% of propose swaps were accepte). By contrast, with fifty temperatures, virtually all swaps are accepte, but too many swaps are require before the rapily-mixing hot chains can have much influence over the values of the col chain, so this again oes not lea to an efficient algorithm. Best is a compromise choice, ten temperatures, which makes the temperature ifferences small enough so swaps are easily accepte, but still large enough that they lea to efficient interchange between hot an col chains an thus to efficient convergence of the col chain to π. That is, we woul like the hot an col chains values to be able to influence each other effectively, which requires lots of successful swaps of significant istance in the inverse-temperature omain. This leas to the question of how to select the number an spacing of the inversetemperature values for a given problem, which is the topic of this paper. To maximise the influence of the hot chain on the col chain, we want to maximise the effective spee with which the chain values move along in the inverse-temperature omain. We shall o this by means of the expecte square jumping istance (ESJD) of this influence, efine below. Our main result (Theorem 1 below) says that to maximise ESJD, it is optimal (uner certain assumptions) to select the inverse-temperatures so that the probability of accepting a propose swap between ajacent inversetemperature values is approximately 23%. 2. Optimal Temperature Spacings for MCMCMC In this section, we consier the question of optimal temperature choice for MCM- CMC algorithms. To fix ieas, we consier the specific situation in which the algorithm is proposing to swap the chain values at two specific inverse temperatures,

5 OPTIMAL SCALING OF MCMCMC 5 namely β an β + ɛ, where β, ɛ > 0 an β + ɛ 1. We shall consier the question of what value of ɛ is optimal in terms of maximising the influence of the hot chain on the col chain. We shall focus on the asymptotic situation in which the chain has alreay converge to its target stationary istribution, i.e. that x n j=0 f (β j). To measure this precisely, let us efine γ = β + ɛ if the swap is accepte, or γ = β if the swap is rejecte. Then γ is an inication of where the β chain s value has travele to, in the inverse temperature space. That is, if the propose swap is accepte, then the value moves from β to γ = β + ɛ for a total inverse-temperature istance of γ β = ɛ, while if the swap is rejecte then it oes not move at all, i.e. the istance it moves is γ β = 0. Hence, γ β inicates the extent to which the swap proposal succeee in moving ifferent x values aroun the β space, which is essential for getting benefit from the MCMCMC algorithm. So, the larger the magnitue of γ β, the more efficient are the swap moves at proviing mixing in the temperature omain. We therefore efine the optimal choice of ɛ to be the one which maximises the stationary (i.e. asymptotic) expecte square jumping istance (ESJD) of this movement, i.e. which maximises ESJD = E π [ (γ β) 2 ] = ɛ 2 E π [P(swap accepte)] ɛ 2 ACC, (2) where from (1), [ ( ACC = E π [P(swap accepte)] = E π min 1, f (β j) (x k ) f (β k) (x j ) ) ] (3) f (β j) (x j ) f (β k) (x k ) an where the expectations are with respect to the stationary (asymptotic) istribution x n j=0 f (β j). We shall also consier the generalisation of (2) to ESJD(h) = E π [ (h(γ) h(β)) 2 ] (4) for some h : [0, 1] R; then (2) correspons to the case where h is the ientity function. Of course, (2) an (4) represent just some possible measures of optimality, an other measures might not necessarily lea to equivalent optimisations; for some iscussion relate to this issue see Section 5. To make progress on computing the optimal ɛ, we restrict (following [20, 22]) to the special case where f (x) = f(x i ), (5) i=1 i.e. the target ensity takes on a special prouct form. (Although (5) is a very restrictive assumption, it is known [20, 22] that conclusions rawn from this special case are often approximately applicable in much broaer contexts.) We also assume that the tempere istributions are simply powers of the original ensity (which is the usual case), i.e. that f (β) ( (x) = f (β) (x i ) f(xi ) ) β. (6) i=1 Intuitively, as the imension gets large, so that small changes in β lea to larger an larger changes in f (β), the inverse-temperature sprea ɛ must ecrease to preserve a non-vanishing probability of accepting a propose swap. Hence, similar i=1

6 6 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL to [20, 22], we shall consier the limit as an corresponingly ɛ 0. To get a non-trivial limit, we take ɛ = 1/2 l (7) for some positive constant l to be etermine. (Choosing a smaller scaling woul correspon to taking l 0, while choosing a larger scaling woul correspon to letting l ; either choice is sub-optimal, since the optimal l will be strictly between 0 an as we shall see.) 2.1. Main Result. Uner these assumptions, we shall prove the following (where Φ(z) = 1 2π z e s2 /2 s is the cf of a stanar normal): Theorem 1. Consier the MCMCMC algorithm as escribe above, assuming (5), (6), an (7). Then( as, the ESJD of (2) is maximize when l is chosen to maximize l 2 2 Φ l ) I(β)/2, where I(β) > 0 is efine in (10) below. Furthermore, for this optimal choice of l, the corresponing probability of accepting a propose swap is given (to three ecimal points) by In fact, the relationship between ESJD of (2) an ACC of (3) is given by ESJD = (2/I(β)) ACC ( Φ 1 (ACC/2) ) 2 (see Figure 2). Finally, the optimal choice of l also maximises ESJD(h) of (4), for any ifferentiable function h : [0, 1] R. (8) Efficiency ESJD Acceptance Rate ACC Figure 2: A graph of the relationship between the expecte square jumping istance (ESJD) an asymptotic acceptance rate (ACC), as escribe in equation (8), in units where I(β) = 1. Proof. For this MCMCMC algorithm, the acceptance probability of a temperature swap is given by α 1 e B

7 where OPTIMAL SCALING OF MCMCMC 7 e B = f (β) f (β) (y)f (β+ɛ) (x)f (β+ɛ) an where x f (β) an y f (β+ɛ) are inepenent an in stationarity. We write g = log f so that g (β) (β) (x) = log f (x), etc. We then have that where B = T (x) = g (β) [ (g (β) (y) g(β+ɛ) (x) g(β+ɛ) (y) ) ( g (β) (x) (y) (x) g(β+ɛ) (x) )] T (y) T (x), (x) = βg (x) (β +ɛ) g (x) = ɛ g (x) = ɛ g(x i ). Now, write E β for expectation with respect to the istribution having ensity proportional to f β, an similarly for Var β. Then in this istribution, g has mean log f(x)f E β β (x)x (g) = f β M(β), (9) (x)x i=1 an variance Var β (g) = (log f(x)) 2 f β (x)x f β (x)x M(β) 2 I(β). (10) Hence, in the istribution proportional to f β, T (x) has mean E β ( T (x) ) = ɛ M(β) µ(β), an variance Var( β T (x) ) = ɛ 2 I(β) σ(β) 2. Now, taking erivative with respect to β (using the quotient rule ), ( ) (log f(x)) M 2 f β (x)x f (β) = β 2 log f(x)x f β (x)x f β = I(β), (11) (x)x from which it follows that µ (β) = ɛ M (β) = ɛ I(β) = σ(β) 2 /ɛ. Now, recall that ɛ = 1/2 l. Hence, T (y) T (x) = l ( g(yi ) g(x i ) ). We claim that, as, T (y) T (x) converges weakly to a ranom variable A N ( σ(β) 2, 2 σ(β) 2 ). To see this, let φ (t) be the characteristic function of T (y) T (x). Then we have: φ (t) = E [ e itɛ(t (y) T (x)) ] = e itɛ(m(β+ɛ) M(β)) { E ( e itɛḡ(y 1) ) E ( e itɛḡ(x 1) )}, where ḡ(x) = g(x) E β (g(x)). Since M(β+ɛ) M(β) = ɛm(β)+o(ɛ) an ɛ = 1/2 l, it follows that lim itɛ(m(β + ɛ) M(β)) = itl 2 I(β). Hence, using a Taylor series expansion, E ( e ) itɛḡ(y 1) E ( e ) ) ) itɛḡ(x 1) = (1 l2 t 2 2 I(β + ɛ) + o( 1 ) (1 l2 t 2 2 I(β) + o( 1 ) i=1

8 8 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL = ) (1 l2 t I(β) + o( 1 ). Hence, φ (t) converges to e itli(β) e l2 t 2I(β) as. This limiting function is the characteristic function of the istribution N( l 2 I(β), 2l 2 I(β)), thus proving the claim. To continue, recall (e.g. [20], Proposition 2.4) that if A N(m, s 2 ), then ( m ) ( E(1 e A ) = Φ + e m+ s2 2 Φ s m ). s s In particular, if A N( c, 2c) (so that E(e A ) = 1), then ( E(1 e A ) = 2 Φ ) c/2. Since the function 1 e B is a boune function, it follows that as, the acceptance rate of MCMCMC converges to E(α) = E(1 e B ) E ( 1 exp ( N( σ(β) 2, 2σ(β) 2 ) )) = 2 Φ( σ(β)/ 2). Now, with ɛ = 1/2 l, ɛ 2 = l 2 an σ(β) 2 /2 = ɛ 2 I(β)/2 = l 2 I(β)/2, whence σ(β)/ 2 = l [I(β)/2] 1/2. Then as, P(accept swap) = E(1 e B ) 2 Φ( l [I(β)/2] 1/2 ), an so ESJD = ɛ 2 P(accept swap) (l 2 /) 2 Φ( l [I(β)/2] 1/2 ) e MC (l). (12) Hence, maximising ESJD is equivalent to choosing l = l opt to maximise l 2 2 Φ( l [I(β)/2] 1/2 ), with the secon factor being the acceptance probability. It then follows as in [20, 22] that when l = l opt, the acceptance probability becomes (to three ecimal places). Inee, making the substitution u = l[i(β)/2] 1/2 shows that fining l opt is equivalent to fining the value û of u which maximizes 8I(β) 2 u 2 Φ( u), an then evaluating â = 2Φ( û). It follows that the value of û, an hence also the value of â, oes not epen on the value of I(β) (provie I(β) > 0). So, it suffices to assume I(β) = 1, in which case we compute numerically that û = , so that â = 2 Φ( ) = = Finally, if instea of ESJD we consier ESJD(h) E[ ( h(γ) h(β) ) 2 ] for some ifferentiable function h, then as ɛ 0 we have ( h(γ) h(β) ) 2 [h (β)] 2 (γ β) 2, so ESJD(h) [h (β)] 2 (ESJD). Hence, as a function of ɛ, maximising ESJD(h) is equivalent to maximising ESJD, an is thus maximise at the same value ɛ opt Iteratively Selecting the Inverse Temperatures. The above results show that, in high imensions uner certain assumptions, it is most efficient to choose the inverse temperatures for MCMCMC such that the average acceptance probability between any two ajacent temperatures is about 23%. This leas to the question of how to select such temperatures. Although this may not be a practical approach in general, we shall aopt an intensive iterative approach to this in orer to assess the effects of our theory in applications. We assume we know (perhaps by running some preliminary simulations) some sufficiently small inverse temperature value β such that the mixing of the chain with corresponing ensity f β is sufficiently fast. This value β shall be our minimal inverse temperature. (In the examples below, we use β = 0.01.)

9 OPTIMAL SCALING OF MCMCMC 9 In terms of β, we construct the remaining β j values iteratively, as follows. First, starting with β 0 = 1, we fin β 1 such that the acceptance probability of a swap between β 0 = 1 an β 1 is approximately Then, given β 1, we fin β 2 such that the acceptance probability of the swap between β 1 an β 2 is again approximately We continue in this way to construct β 3, β 4,..., continuing until we have β j β for some j. At that point, we replace β j by β an stop. To implement this iterative algorithm, we nee to fin for each inverse-temperature β a corresponing inverse-temperature β < β such that the average acceptance probability of the swaps between β an β is approximately We use a simulationbase approach for this, via ranom variables {(X n, X n, ρ n )} n 0. After initialisations for n = 0, then at each time n 1, we let β n = β(1 + e ρn ) 1, an raw X n+1 f β an X n+1 f β n (or upate them from some Markov chain ynamics preserving those istributions). The probability of accepting a swap between β an β n is then given by α n+1 1 e B n+1 where B n+1 = (β n β) ( g (X n+1) g (X n+1 ) ). We then attempt to converge towars 0.23 by replacing ρ n by ρ n+1 = ρ n + n 1 (α n ), i.e. using a stochastic approximation algorithm [19, 2]. This ensures that if α n+1 > 0.23 then β will ecrease, while if α n+1 > 0.23 then β will increase, as it shoul. We shall use this iterative simulation approach to etermine the inverse temperatures {β j } in all of our simulation examples in Section 3 below Comparison with Geometric Temperature Spacing. In some cases, it is believe that the optimal choice of temperatures is geometric, i.e. that we want β j+1 = cβ j for appropriate constant c. Now, the notation of Theorem 1 correspons to setting β j = β + ɛ an β j+1 = β. So, it is optimal to have β j+1 = cβ j precisely when ɛ opt β in Theorem 1. Recall now that, from the en of the proof of Theorem 1, û l opt [I(β)/2] 1/2. = , an in particular û oes not epen on β. Then l opt = û[2/i(β)] 1/2. So, ɛ opt = l opt 1/2 = û 1/2 [2/I(β)] 1/2 I(β) 1/2. It follows that the conition ɛ opt β is equivalent to the conition I(β) 1/2 β, i.e. I(β) β 2. While this oes hol for some examples, e.g. the case when f(x) = e x r (see Section 2.4 below), it oes not hol for most other examples (e.g. for the Gamma istribution, various log x terms appear which o not lea to such a simple relationship). Thus, the optimal spacing of inverse temperatures is not necessarily geometric. In our simulations below, we consier both geometric spacings, an spacings foun using our 23% rule. We shall see that the 23% rule leas to superior performance A Simple Analytical Example. Consier the specific example where the target ensity is given by (5) with f(x) = e x r for some fixe r > 0. (This inclues the Gaussian (r = 2) an Double-Exponential (r = 1) cases.) For this example, f β (x) e β x r, an g(x) = x r. We then compute from (9) that g(x) f M(β) = E β β (x) x (g) = f β = x r e x r x (x) x e x r x

10 10 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL = 2 Γ(1/r) β (1+r)/r r 2 2 Γ(1 + 1 r ) β 1/r = 1/rβ where Γ(z) = 0 t z 1 e t t is the gamma function. Then by (11), I(β) Var β (g) = M (β) = 1/rβ 2 β 2. Hence, since I(β) β 2, it follows as above that the sprea of inverse-temperatures shoul be geometric, with β j+1 = cβ j for some constant c Relation to Previous Literature. The issue of temperature spacings an acceptance rates for MCMCMC has been wiely stuie in the physics literature [11, 12, 18, 6, 13, 5, 26, 27]. For example, the conclusion of Theorem 1, that the inverse temperatures shoul be chosen to make the swap acceptance probabilities all equal to the same value (here 0.234), is relate to the uniform exchange rates criterion escribe in [10], p In aition, simulation stuies [6] have suggeste tuning the temperatures so the swap acceptance rate is about 20%, an analytic stuies have suggeste [13] an optimal rate of 0.23 in some specific circumstances, as we now iscuss. In physics, the canonical ensemble correspons to writing f (β) (x) = e βv (x), where V is the energy function an 1/β is the temperature. Thus, in our notation, V (x) = g (x). Furthermore the average energy at inverse-temperature β is given by U(β) = V (x) e βv (x) x e βv (x) x. Physicists have approache the problem using the concepts of ensity of states an heat capacity. If the state space X is iscrete, the ensity of states is efine as the function G(u) := #{x X : V (x) = u}. If X is continuous, then G(u) is efine to be the ( 1)-imensional Lebesgue measure of the level set {x X : V (x) = u}. The heat capacity of the system at inverse-temperature β is efine as C(β) = β 2 U(β) β = β 2 Var β (V (X)). In our notation, Var β (V (X)) = Var β ( g (X)) = I(β), so C(β) = β 2 I(β). Hence, C(β) is constant if an only if I(β) β 2 as iscusse in Section 2.3. Previous literature has consiere, as have we, the iealise case in which exact sampling is one from each istribution f β. In this case, a swap in MCMCMC between β an β (β < β ) has acceptance probability [ ( )] A(β, β ) = E min 1, e (β β )(V (X) V (X )), with the expectation taken with respect to X f β an X f β. Assuming that the heat capacity is constant (i.e., C(β) C), an that the ensity of states has the form ( G(u) = 1 + β ) C 0 C (u u 0) G(u 0 ),

11 OPTIMAL SCALING OF MCMCMC 11 [11] shows that the average acceptance probability of a swap between β an β > β is given by: A(β, β ) = 2Γ(2C + 2) Γ(C + 1) 2 β/β 0 u C u (1 + u) 2(C+1) = 2 B(C + 1, C + 1) 1/(1+R) 0 θ C (1 θ) C θ, (13) where R = β /β > 1, Γ(x) is the Gamma function an B(a, b), the Beta function. Assuming a constant heat capacity an a more specific form for the ensity of state, G(u) = (2π)C+1 Γ(C + 1) uc, [18] erive the same formula as in (13) for A(β, β ), which they call the incomplete beta function law for parallel tempering. Letting C (an thus the imension of the system) go to infinity, [12, 18] use (13) to erive a limiting expression for the acceptance probability of MCMCMC similar to that obtaine in (12) above. [13] uses these limits to argue that 0.23 is approximately the optimal asymptotic swap acceptance rate using arguments somewhat similar to ours (inee, Figure 1 of [13] contains the same acceptance-versus-efficiency curve implie by (12)). Now, it seems that the heat capacity C(β) being constant is quite a restrictive assumption; it is satisfie for the example f(x) = e x r consiere in Section 2.4, but is usually not satisfie for more complicate functions. By contrast, our assumption (6) oes not impose any special conitions on the nature of the ensity function f(x). 3. Simulation examples 3.1. A Simple Gaussian Example. We illustrate Theorem 1 with the following simulation example. We implement the MCMCMC escribe above for Gaussian target an tempere istributions. Specifically, we consier just two inversetemperatures β 0 1 an β 1 with 0 < β 1 < 1. We efine f (x) an f (β) (x) as in (5) an (6), with f the ensity of a stanar normal istribution N(0, 1), so that f (β) = N(0, β 1 ). We use a version of the algorithm where the within-temperature moves are given by a Ranom Walk Metropolis (RWM) algorithm with Gaussian proposal istribution N(x, ( /β) I ) which is asymptotically optimal [22]. Our algorithm attemps 20 within-temperature moves for every one time it attempts a temperature swap. We ran this algorithm for each of 50 ifferent possible choices of β 1, each with 0 < β 1 < 1. For each such choice of β 1, we ran the algorithm for a total of 500,000 iterations to ensure goo averaging, an then estimate the acceptance probability as the fraction of propose swaps accepte, an the expecte square jump istance (ESJD) as the average of the square temperature jump istances (γ β 0 ) 2 (or, equivalently, as (β 0 β 1 ) 2 times the average square swap istance). Figure 3 plots the estimate acceptance probabilities versus square jump istances (SJD), for each of four ifferent imensions (10, 20, 50, an 100). We can see from these results that the swap acceptance probability is inee close to optimal, an it gets closer to optimal as the imension increases, an furthermore the relationship between

12 12 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL the two quantities is given approximately by (8) an Figure 2, just as Theorem 1 preicts. (a) (b) (c) () SJD SJD SJD SJD Acc. Prob. Acc. Prob. Acc. Prob. Acc. Prob. Figure 3: square jump istance (SJD) versus estimate Acceptance Probability for the Gaussian case f = N(0, 1) an f β = N(0, β 1 ), in imensions (a) = 10, (b) = 20, (c) = 50, an () = Inhomogeneous Gaussian Example. We now moify the previous example to a non-i.i.. case where f (x) = i=1 f i(x i ) with f i = N(0, i 2 ), so that f β i = N(0, i 2 β 1 ). (The inhomogeneity implies, in particular, that assumption (5) is no longer satisfie.) The rest of the etails remain exactly the same as for the previous example, except that the proposal istribution for the RWM within-temperature moves is now taken to be N (x, ( i 2 /β)i ) for the i th coorinate, which is optimal in this case. The resulting plots (for imensions = 10 an = 100) are given in Figure 4. Here again, an optimum value of β for the ESJD function emerges at a swap acceptance probability of approximately 23%. Again, the agreement with Theorem 1 an the relationship (8) an Figure 2 is striking. This is unsurprising given that this target istribution can be written as a collection of scalar linear transformations of Example 3.1, an Theorem 1 remains invariant through such transformations. (a) (b) SJD SJD Acc. Prob. Acc. Prob.

13 OPTIMAL SCALING OF MCMCMC 13 Figure 4: SJD versus Acceptance Probability for the inhomogeneous Gaussian case f i = N(0, i 2 ) an f β i = N(0, i 2 β 1 ), in imensions (a) = 10 an (b) = A Mixture Distribution Example. In this example an the next, we compare the two temperature scheuling strategies iscusse in Sections 2.2 an 2.3. One strategy consists of selecting the inverse-temperatures as a geometric sequence; we saw that this strategy is optimal when I(β) β 2, or equivalently the heat capacity is constant. The other strategy consists of choosing the temperatures such that the acceptance probability between any two ajacent temperatures is about 23%. We report here a simulation example comparing the two strategies. Consier the ensity f (x 1,..., x ) = f(x i ; ω i, µ i, σi 2 ) corresponing to a mixture of normal istributions, where = 20, an 3 f(x; ω, µ, σ 2 1 ) = ω j e 1 2σ 2 (x µ j ) 2 j, 2πσj j=1 with ω = (1/3, 1/3, 1/3), µ = ( 5, 0, 5), an σ = (0.2, 0.2, 0.2). For the 23% rule, we buil the temperatures sequentially as escribe in Section 2.2, with β = The number of chains is thus itself unknown initially, an turns out to be 9 as shown in the top line of Table 1. The geometric scheule uses that same number (9) of chains, geometrically space between β 0 1 an β (Thus, the geometric scheule is allowe to borrow the total number of chains from the 23% rule, which is in some sense overly generous to the geometric strategy.) Each chain was run for 200, 000 iterations. We report the square jump istances both in the temperature space an in the X - space. We observe that the 23% rule performs significantly better that the geometric scheuling, in terms of having larger average square jumping istances in both the β space an the X space rule Geometric spacing Table 1: Inverse-temperature scheules for the two ifferent temperature scheuling strategies for the mixture istribution example. i=1 E [ β n β n 1 2] E [ X n X n 1 2] 0.23 rule Geometric spacing Table 2: Square Jumping istances in β space an X space for the two ifferent temperature scheuling strategies for the mixture istribution example.

14 14 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL 3.4. Ising Distribution. We now compare the two scheuling strategies using the Ising istribution on the N N two-imensional lattice, given by π(x) = exp (E(x)) /Z, where Z is the normalizing constant an E(x) = J ( N i=1 N 1 j=1 x ij x i,j+1 + N 1 i=1 ) N x ij x i+1,j, (14) with x i {1, 1}. Obviously, this istribution oes not satisfy the assumption (5). For efiniteness, we choose N = 50 an J = The Ising istribution amits a phase transition at the critical temperature T c = 2J/ log(1 + 2) = , i.e. critical inverse-temperature β c = 0.979, aroun which the heat capacity unergoes stiff variation. Tempering techniques like MCMCMC an Simulate Tempering can perform poorly near critical temperatures because small changes in the temperature aroun T c result in rastic variations of the properties of the istribution. We expect the 23% rule to outperform the geometric scheule in this case. For our algorithm, for the within-temperature (i.e., X -space) moves, we use a Metropolis-Hastings algorithm with a proposal which consists of ranomly selecting a position (i, j) on the lattice an flipping its spin from x ij to x ij. We first etermine iteratively the inverse-temperature points using the algorithm escribe in Section 2.2. Then, given the lowest an highest temperatures an the number of temperature points etermine by this algorithm, we compute the geometric spacing. Figure 5 gives the selecte temperatures (not inverse-temperatures, i.e. it shows T j 1/β j ) from the two methos (the circles represent the 23% rule an the triangles represent the geometric spacing). j=1 Temperatures Figure 5: The temperatures {T j } selecte for the Ising moel (triangle: geometric spacing; circle: 23% rule). We note here that the 23% rule an the geometric spacing prouce very ifferent temperature points. Interestingly, in orer to maintain the right acceptance rate, the 23% rule puts more temperature points near the critical temperature (1.021); this is further illustrate in Table 3.

15 OPTIMAL SCALING OF MCMCMC 15 23% rule Geometric rule Table 3: Selecte temperature values near T c = for the Ising moel. Each version of MCMCMC was run for 5 million iterations. The average square jump istance in inverse-temperature space was for the 23% rule an for the geometric spacing, i.e. the 23% rule was almost twice as efficient. We also present the trace plots (subsample every 1,000 iterations) of the function {E(X n ), n 0} uring the simulation for each of the two algorithms in Figure 6, together with their autocorrelation functions. While the trace plots themselves appear similar, the autocorrelations inicate that the 23% rule has resulte in a faster mixing (less correlate) sampler in the X -space as well. We confirm this by calculating the empirical average square jump istance in the X space, SJD(X ) = n 1 n j=1 E(X j) E(X j 1 ) 2, which works out to for the 23% rule, an for the geometric sequence, i.e. the 23% rule is 1.6 times as efficient by this measure. (a): The geometric scheule (b): geometric scheule (c): The 23 % rule (): 23 % rule Figure 6: (a),(c) trace plots an (b),() autocorrelation functions of E(X n ) for the Ising moel, subsample every 1,000 iterations, with (a),(b) geometric temperature spacing, an (c),() the 23% rule.

16 16 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL 4. Simulate Tempering We now consier the Simulate Tempering algorithm [16, 7]. This is somewhat similar to MCMCMC, except now there is just one particle which can jump either within-temperature or between-temperatures. Again we focus on the betweentemperatures move, an keep similar notation to before. The state space is now given by {β 0, β 1,..., β n } X an the target ensity is proportional to f (β, x) = i=1 ek(β) f β (x i ) = e K(β) f (β) (x) for some choice of normalizing constants K(β). A propose temperature move from (β, x) to (β + ɛ, x) is accepte with probability ( ) e K(β+ɛ) f β+ɛ (x) α = 1 e K(β) f β (x). We again make the simplifying assumptions (5) an (6), an again let ɛ 0 accoring to (7). We further assume that we have chosen the true normalising constants, ( ) K(β) = log f β (x)x, (15) so that f (β, ) is inee a normalise ensity function. This choice makes the stationary istribution for β be uniform on {β 0, β 1,..., β n }; this appears to be optimal since it allows the β values to most easily an quickly migrate from the col temperature β 0 to the hot temperature β n an then back to β 0 again, thus maximising the influence of the hot temperature on the mixing of the col chain Main Result. Uner the above assumptions, we prove the following analog of Theorem 1: Theorem 2. Consier the Simulate Tempering algorithm as escribe above, assuming (5), (6), (7), an (15). Then as, the ESJD of (2) is maximize when l is chosen to maximize l 2 2 Φ ( li(β) 1/2 /2 ). Furthermore, for this optimal choice of l, the corresponing probability of accepting a propose swap is given (to three ecimal points) by In fact, the relationship between ESJD of (2) an ACC of (3) is given by ESJD = (4/I(β)) ACC ( Φ 1 (ACC/2) ) 2 (16) (see Figure 7). Finally, this choice of l also maximises the ESJD(h) of (4), for any ifferentiable function h : [0, 1] R.

17 OPTIMAL SCALING OF MCMCMC 17 Efficiency ESJD Acceptance Rate ACC Figure 7: A graph of the relationship between the expecte square jumping istance (ESJD) an asymptotic acceptance rate (ACC), as escribe in equations (16) (top, green) an (8) (bottom, blue), in units where I(β) = 1. Proof. A propose temperature move from (β, x) to (β + ɛ, x) is accepte with probability ( ) e K(β+ɛ) f β+ɛ (x) α = 1 e K(β) f β (x) = 1 e (K(β+ɛ) K(β)) e ɛ P i=1 g(x i) = 1 e (K(β+ɛ) K(β))+ɛM(β) e ɛ P i=1 ḡ(x i), where ḡ(x) = g(x) M(β). ( ) If we choose K(β) = log f β (x)x, then we compute (again using the quotient rule ) by comparison with (9) that K (β) = M(β). Hence, from (11), K (β) = M (β) = I(β). Therefore, by a Taylor series expansion, K(β + ɛ) K(β) = ɛ K (β) ɛ2 K (β) = ɛm(β) 1 2 ɛ2 I(β) for some β ɛ [β, β + ɛ]. It follows that (K(β + ɛ) K(β))+ɛM(β) = ɛ 2 K (β ɛ )/2 = (l 2 /2)K (β ɛ ) = (l 2 /2)I(β ɛ ). Setting ɛ = 1/2 l for some l > 0 an letting, it follows from the above an the central limit theorem that lim 1 ( e K(β+ɛ) f β+ɛ e K(β) f β (x) = lim 1 ) (x) ( = lim 1 e (l2 /2)I(β ɛ) e ɛ P ) i=1 ḡ(x i) ( e (l2 /2)I(β) e l 1/2 P i=1 ḡ(x i) ) 1 e A, where A N( (l 2 /2)I(β), l 2 Var β (g)) = N( (l 2 /2)I(β), l 2 I(β)). The rest of the argument is stanar, just as in Theorem 1, an shows that for large, the square jump istance of Simulate Tempering is approximately ESJD = (l 2 /2) 2 Φ( li(β) 1/2 /2) e ST (l), (17)

18 18 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL which is again maximise by choosing l such that the acceptance probability is (to three ecimal places) equal to The argument for (16) is ientical to that for Theorem Comparison of Simulate Tempering an MCMCMC. Now that we have optimality results for both Simulate Tempering (Theorem 2) an MCMCMC (Theorem 1), it is natural to compare them. We have the following. Corollary 1. Uner assumptions (5), (6), (7), an (15), asymptotically as, the maximal value of ESJD for Simulate Tempering is precisely twice that for MCMCMC (cf. Figure 7). (More generally, the maximal ESJD(h) for Simulate Tempering is precisely twice that for MCMCMC, for any ifferentiable h). Furthermore, the optimal choice of l for Simulate Tempering is 2 times that for MCMCMC. Proof. Comparing e MC (l) from (12) with e ST (l) from (17), we see that e MC (l) = 1 2 e ST (l 2). In particular, sup l e MC (l) = 1 2 sup l e ST (l), an also argsup l e MC (l) = (1/ 2) argsup l e ST (l), which gives the result. Corollary 1 says that, uner appropriate conitions, the ESJD of MCMCMC is precisely half that of Simulate Tempering. That is, if Simulate Tempering has ieally-chosen normalisation constants K(β) as in (15), then in some sense it is twice as efficient as MCMCMC. However, choosing K(β) ieally may be ifficult or impossible (though it may be possible in certain cases to learn these weights aaptively uring the simulation, see e.g. [4]), an for non-optimal K(β) this comparison no longer applies. By contrast, MCMCMC oes not require K(β) at all. Furthermore, a single temperature swap of MCMCMC upates two ifferent chain values, an thus may be twice as valuable as a single temperature move uner Simulate Tempering. If so, then the two ifferent factors of two precisely cancel each other out. In aition, it may take more computation to o one within-temperature step of MCMCMC (involving n + 1 particles) than one step of Simulate Tempering (involving just one particle). In summary, this comparison of the two algorithms involves various subtleties an is not entirely clear-cut, though Corollary 1 still provies certain insights into the relationship between them Langevin Dynamics for Simulate Tempering? One limitation of Simulate Tempering is that it performs a Ranom Walk Metropolis (RWM) algorithm in the β-space, where each proposal bears a high chance of being rejecte if the value of the energy i=1 g(x i) is not consistent with the propose value of temperature. Now, it is known [21] that Langevin ynamics (which use erivatives to improve the proposal istribution) are significantly more efficient than RWM when they can be applie. In the context of the inverse temperatures, a Langevin-style proposal istribution coul take the form: [ ( ) ] N β + σ2 g(x i ) K(β), σ 2. 2 i=1

19 OPTIMAL SCALING OF MCMCMC 19 Furthermore, in this case K(β) = E β (g(x)). Therefore, such a Langevin proposal will compare the current value of i=1 g(x i) to the average value K(β). If i=1 g(x i) K(β) then smaller temperature are more compatible an are more likely to be propose (an vice versa). The main limitation of this iea is that in practice, we o not know the graient K(β). This can perhaps be estimate uring the simulation as the average of the energy j=1 g(x n) at times n where the inverse-temperature level β is visite, but this nees more investigation an we o not pursue it here. 5. Discussion an Future Work This paper has presente certain results about optimal inverse-temperature spacings. In particular, we have prove that for MCMCMC (Theorem 1) an Simulate Tempering (Theorem 2), it is optimal (uner certain conitions) to space the inversetemperatures so that the probability of accepting a temperature swap or move is approximately 23%. Our theorems were prove uner the restrictive assumption (5), but we have seen in simulations (Section 3) that they continue to approximately apply even if this assumption is violate. Theorems 1 an 2 were state in terms of the expecte square jumping istances (ESJD) of the inverse-temperatures, assuming the chain begins in stationarity. While this provies useful information about optimising the algorithm, it provies less information about the algorithm s long-term behaviour. In this section, we consier various relate issues an possible future research irections Inverse-Temperature Process. It is possible to efine an entire inverse temperature process, {y n } n=0, living on the state space Y = {β 0, β 1,..., β n }. For Simulate Tempering (ST), this process is simply the inverse-temperature β at each iteration. For MCMCMC, this process involves tracking the influence of a particular inverse-temperature s chain through the temperature swaps, i.e. if y n = β j, an then the β j an β j+1 chains successfully swap, then y n+1 = β j+1 (otherwise y n+1 = β j ). In general, this {y n } process will not be Markovian. However, if we assume that the ST/MCMCMC algorithm oes an automatic i.i.. reset into the istribution f (β) immeiately upon entering the inverse-temperature β (i.e., where the chain values associate with each inverse-temperature β are taken as i.i.. raws from f (β) at each iteration), then {y n } becomes a Markov chain. Inee, uner this assumption, it correspons to a simple ranom walk (with holing probabilities) on Y. In that case, it seems likely that as, if we appropriately shrink space by a factor of an spee up time by a factor of, then as in the RWM case [20, 22], the {y n } will converge to a limiting Langevin iffusion process. In that case, maximising the spee of the limiting iffusion is equivalent to optimising the original algorithm (accoring to any optimisation measure, see [22]), an this woul provie another justification of the values of l opt in Theorems 1 an 2. In terms of this {y n } Markov chain, ESJD from (2) is simply the expecte square jumping istance of this process. Furthermore, ESJD(h) from (4) is then the Dirichlet form corresponing to the Markov operator for {y n }. Thus, Theorems 1 an 2

20 20 Y. F. ATCHADÉ, G. O. ROBERTS, AND J. S. ROSENTHAL woul also be saying that the given values of l opt maximise the associate Dirichlet form. Of course, this reset assumption that the moves within each inverse temperature β j result in an immeiate i.i.. reset to the ensity f β j is not realistic, since in true applications the convergence within each fixe temperature chain woul occur only graually (especially for larger β j ). However, it is not entirely unrealistic either, for a number of reasons. For example, some algorithms might run many within-temperature moves for each single attempte temperature swap, thus making the within-temperature chains effectively mix much faster. Also, some withintemperature algorithms (e.g. Langevin iffusions, see [21]) have convergence time of smaller orer (O( 1/3 )) than that of the temperature-swapping ranom-walk behaviour (O()), so effectively the within-temperature chains converge immeiately on a relative scale, which is equivalent to the reset assumption. Proving convergence to limiting iffusions can be rather technical (see e.g. [20, 21]), so we o not pursue it here, but rather leave it for future work. In any case, assuming the limiting iffusion oes hol (as it probably oes), this provies new insight an context for the optimal behaviour of MCMCMC an ST algorithms Joint Diffusion Limits? If we o not assume the reset assumption as above, then the process {y n } is not Markovian, so a iffusion limit seems far less certain. However, it is still possible to jointly consier the chain {x n } n=0 living on X (for ST) or X (for MCMCMC), together with the inverse temperature process {y n }. The joint process (x n, y n ) must be Markovian, an it is quite possible that it has its own joint iffusion limit an joint optimal scalings. This woul become quite technical, an we o not pursue it here, but we consier it an interesting topic for future research Moels For How the Algorithms Converge. Relate to the above is the question of a eeper unerstaning of how MCMCMC an ST make use of the tempere istributions to improve convergence. Significant progress was mae in this irection in previous research (e.g. [26]), but more remains to be one. One possible simplifie moel is to assume the process oes not move at all in the within-temperature irection, except at the single inverse-temperature β = β when it oes an immeiate i.i.. reset to f ( β). At first we thought that such a moel is insufficient since it woul then be exponentially ifficult for the algorithm to climb from β = β to β = β 0, but the work of [26] shows that this is in fact not the case. So, we may explore this moel further in future work. A compromise moel is where the state space X is partitione into a finite number of moes, an when entering any β > 0 the process oes a reset into f (β) conitional on remaining in the current moe. Such a process assumes fast within-moe mixing at all temperatures, but between-moe mixing only at the hot temperature when β = 0. We believe that in this case, there is a joint iffusion limit of the joint (β,moe) process. If so, this woul be interesting an provie a goo moel for how the hot temperature an intermeiate temperature mixing all works together State space invariance. Our results o not epen in any way on the upate mechanism within the state space X. Moreover, although our results are prove

21 OPTIMAL SCALING OF MCMCMC 21 for the case of IID components, this observation immeiately generalises our finings to the very large class of istributions which we can write as functions of IID components. In fact the only thing which stops this being a completely general imensional state space result is the fact that ST on the transforme variables is a ifferent metho from the transforme ST algorithm. From a practical point of view, this might even suggest improve tempering schemes (rather than simply consiering powere ensities) Implementation. Fining practical ways to implement MCMCMC an ST in light of our results has not been a focus of this paper, though this is an important issue for future consieration. The approach aopte in Subsection 2.2 is esigne to be a thorough methoology to investigate the effects of Theorem 1, but cannot be consiere a practical approach in general. Intriguing is the propect of integrating our theory within an aaptive MCMC framework as in for example [3, 1, 24, 23] Parallelisation. As computational power become more an more wiesprea, there is increasing interest in running Monte Carlo algorithms in parallel on many machines, perhaps even using Graphics Processing Units (GPUs), see e.g. [8, 25, 14]. It woul be interesting to consier, even in simple situations, how to optimise parallel computations asymptotically as the number of processors goes to infinity. Acknowlegements: We thank the eitors an anonymous referees for very helpful reports that significantly improve the presentation of our results. References [1] C. Anrieu an E. Moulines (2007) On the ergoicity properties of some Markov chain Monte Carlo algorithms. Ann. Appl. Prob. 44, 2, [2] C. Anrieu an C.P. Robert (2001), Controlle MCMC for optimal sampling. Preprint. [3] Y.F. Atchae, An aaptive version of the Metropolis ajuste Langevin algorithm with a truncate rift, Meth. an Comp. in Appl. Prob., 2006, 8, 2, [4] Y.F. Atchae an J.S. Liu (2004), The Wang-Lanau algorithm in general state spaces: applications an convergence analysis. Statistica Sinica, to appear. [5] B. Cooke an S.C. Schmiler (2008), Preserving the Boltzmann ensemble in replica-exchange molecular ynamics. J. Chem. Phys. 129, [6] D.J. Earl an M.W. Deem (2005), Parallel tempering: theory, applications, an new perspectives. J. Phys. Chem. B 108, [7] C.J. Geyer (1991), Markov chain Monte Carlo maximum likelihoo. In Computing Science an Statistics: Proceeings of the 23r Symposium on the Interface, [8] P.W. Glynn an P. Heielberger (1991), Analysis of Parallel, Replicate Simulations uner a Completion Time Constraint. ACM Trans. on Moeling an Simulation 1, [9] W.K. Hastings (1970), Monte Carlo sampling methos using Markov chains an their applications. Biometrika 57, [10] Y. Iba (2001), Extene ensemble Monte Carlo. Int. J. Mo. Phys. C 12(5), [11] D.A. Kofke (2002), On the acceptance probability of replica-exchange Monte Carlo trials. J. Chem. Phys. 117, Erratum: J. Chem. Phys. 120, [12] D.A. Kofke (2004), Comment on reference [18]. J. Chem. Phys. 121, 1167.

Introduction. A Dirichlet Form approach to MCMC Optimal Scaling. MCMC idea

Introduction. A Dirichlet Form approach to MCMC Optimal Scaling. MCMC idea Introuction A Dirichlet Form approach to MCMC Optimal Scaling Markov chain Monte Carlo (MCMC quotes: Metropolis et al. (1953, running coe on the Los Alamos MANIAC: a feasible approach to statistical mechanics

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

Differentiation ( , 9.5)

Differentiation ( , 9.5) Chapter 2 Differentiation (8.1 8.3, 9.5) 2.1 Rate of Change (8.2.1 5) Recall that the equation of a straight line can be written as y = mx + c, where m is the slope or graient of the line, an c is the

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Optimization of Geometries by Energy Minimization

Optimization of Geometries by Energy Minimization Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.

More information

inflow outflow Part I. Regular tasks for MAE598/494 Task 1

inflow outflow Part I. Regular tasks for MAE598/494 Task 1 MAE 494/598, Fall 2016 Project #1 (Regular tasks = 20 points) Har copy of report is ue at the start of class on the ue ate. The rules on collaboration will be release separately. Please always follow the

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu

More information

arxiv:math/ v1 [math.pr] 3 Jul 2006

arxiv:math/ v1 [math.pr] 3 Jul 2006 The Annals of Applie Probability 006, Vol. 6, No., 475 55 DOI: 0.4/050560500000079 c Institute of Mathematical Statistics, 006 arxiv:math/0607054v [math.pr] 3 Jul 006 OPTIMAL SCALING FOR PARTIALLY UPDATING

More information

On Characterizing the Delay-Performance of Wireless Scheduling Algorithms

On Characterizing the Delay-Performance of Wireless Scheduling Algorithms On Characterizing the Delay-Performance of Wireless Scheuling Algorithms Xiaojun Lin Center for Wireless Systems an Applications School of Electrical an Computer Engineering, Purue University West Lafayette,

More information

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments Problem F U L W D g m 3 2 s 2 0 0 0 0 2 kg 0 0 0 0 0 0 Table : Dimension matrix TMA 495 Matematisk moellering Exam Tuesay December 6, 2008 09:00 3:00 Problems an solution with aitional comments The necessary

More information

Final Exam Study Guide and Practice Problems Solutions

Final Exam Study Guide and Practice Problems Solutions Final Exam Stuy Guie an Practice Problems Solutions Note: These problems are just some of the types of problems that might appear on the exam. However, to fully prepare for the exam, in aition to making

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7. Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

arxiv: v2 [cond-mat.stat-mech] 11 Nov 2016

arxiv: v2 [cond-mat.stat-mech] 11 Nov 2016 Noname manuscript No. (will be inserte by the eitor) Scaling properties of the number of ranom sequential asorption iterations neee to generate saturate ranom packing arxiv:607.06668v2 [con-mat.stat-mech]

More information

Monotonicity for excited random walk in high dimensions

Monotonicity for excited random walk in high dimensions Monotonicity for excite ranom walk in high imensions Remco van er Hofsta Mark Holmes March, 2009 Abstract We prove that the rift θ, β) for excite ranom walk in imension is monotone in the excitement parameter

More information

Solutions to Practice Problems Tuesday, October 28, 2008

Solutions to Practice Problems Tuesday, October 28, 2008 Solutions to Practice Problems Tuesay, October 28, 2008 1. The graph of the function f is shown below. Figure 1: The graph of f(x) What is x 1 + f(x)? What is x 1 f(x)? An oes x 1 f(x) exist? If so, what

More information

Lecture 6: Calculus. In Song Kim. September 7, 2011

Lecture 6: Calculus. In Song Kim. September 7, 2011 Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear

More information

Differentiability, Computing Derivatives, Trig Review. Goals:

Differentiability, Computing Derivatives, Trig Review. Goals: Secants vs. Derivatives - Unit #3 : Goals: Differentiability, Computing Derivatives, Trig Review Determine when a function is ifferentiable at a point Relate the erivative graph to the the graph of an

More information

Calculus and optimization

Calculus and optimization Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Differentiability, Computing Derivatives, Trig Review

Differentiability, Computing Derivatives, Trig Review Unit #3 : Differentiability, Computing Derivatives, Trig Review Goals: Determine when a function is ifferentiable at a point Relate the erivative graph to the the graph of an original function Compute

More information

A Sketch of Menshikov s Theorem

A Sketch of Menshikov s Theorem A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

ELEC3114 Control Systems 1

ELEC3114 Control Systems 1 ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay

More information

Analytic Scaling Formulas for Crossed Laser Acceleration in Vacuum

Analytic Scaling Formulas for Crossed Laser Acceleration in Vacuum October 6, 4 ARDB Note Analytic Scaling Formulas for Crosse Laser Acceleration in Vacuum Robert J. Noble Stanfor Linear Accelerator Center, Stanfor University 575 San Hill Roa, Menlo Park, California 945

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae

More information

05 The Continuum Limit and the Wave Equation

05 The Continuum Limit and the Wave Equation Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,

More information

Calculus in the AP Physics C Course The Derivative

Calculus in the AP Physics C Course The Derivative Limits an Derivatives Calculus in the AP Physics C Course The Derivative In physics, the ieas of the rate change of a quantity (along with the slope of a tangent line) an the area uner a curve are essential.

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

A Second Time Dimension, Hidden in Plain Sight

A Second Time Dimension, Hidden in Plain Sight A Secon Time Dimension, Hien in Plain Sight Brett A Collins. In this paper I postulate the existence of a secon time imension, making five imensions, three space imensions an two time imensions. I will

More information

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January

More information

Section 2.7 Derivatives of powers of functions

Section 2.7 Derivatives of powers of functions Section 2.7 Derivatives of powers of functions (3/19/08) Overview: In this section we iscuss the Chain Rule formula for the erivatives of composite functions that are forme by taking powers of other functions.

More information

Make graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides

Make graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides Reference 1: Transformations of Graphs an En Behavior of Polynomial Graphs Transformations of graphs aitive constant constant on the outsie g(x) = + c Make graph of g by aing c to the y-values on the graph

More information

Permanent vs. Determinant

Permanent vs. Determinant Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Generalization of the persistent random walk to dimensions greater than 1

Generalization of the persistent random walk to dimensions greater than 1 PHYSICAL REVIEW E VOLUME 58, NUMBER 6 DECEMBER 1998 Generalization of the persistent ranom walk to imensions greater than 1 Marián Boguñá, Josep M. Porrà, an Jaume Masoliver Departament e Física Fonamental,

More information

Expected Value of Partial Perfect Information

Expected Value of Partial Perfect Information Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo

More information

Quantum Mechanics in Three Dimensions

Quantum Mechanics in Three Dimensions Physics 342 Lecture 20 Quantum Mechanics in Three Dimensions Lecture 20 Physics 342 Quantum Mechanics I Monay, March 24th, 2008 We begin our spherical solutions with the simplest possible case zero potential.

More information

Jointly continuous distributions and the multivariate Normal

Jointly continuous distributions and the multivariate Normal Jointly continuous istributions an the multivariate Normal Márton alázs an álint Tóth October 3, 04 This little write-up is part of important founations of probability that were left out of the unit Probability

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

SYDE 112, LECTURE 1: Review & Antidifferentiation

SYDE 112, LECTURE 1: Review & Antidifferentiation SYDE 112, LECTURE 1: Review & Antiifferentiation 1 Course Information For a etaile breakown of the course content an available resources, see the Course Outline. Other relevant information for this section

More information

18 EVEN MORE CALCULUS

18 EVEN MORE CALCULUS 8 EVEN MORE CALCULUS Chapter 8 Even More Calculus Objectives After stuing this chapter you shoul be able to ifferentiate an integrate basic trigonometric functions; unerstan how to calculate rates of change;

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Lagrangian and Hamiltonian Mechanics

Lagrangian and Hamiltonian Mechanics Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical

More information

3.2 Differentiability

3.2 Differentiability Section 3 Differentiability 09 3 Differentiability What you will learn about How f (a) Might Fail to Eist Differentiability Implies Local Linearity Numerical Derivatives on a Calculator Differentiability

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

Entanglement is not very useful for estimating multiple phases

Entanglement is not very useful for estimating multiple phases PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The

More information

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix

More information

Cascaded redundancy reduction

Cascaded redundancy reduction Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,

More information

Some Examples. Uniform motion. Poisson processes on the real line

Some Examples. Uniform motion. Poisson processes on the real line Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]

More information

Chapter 4. Electrostatics of Macroscopic Media

Chapter 4. Electrostatics of Macroscopic Media Chapter 4. Electrostatics of Macroscopic Meia 4.1 Multipole Expansion Approximate potentials at large istances 3 x' x' (x') x x' x x Fig 4.1 We consier the potential in the far-fiel region (see Fig. 4.1

More information

ELECTRON DIFFRACTION

ELECTRON DIFFRACTION ELECTRON DIFFRACTION Electrons : wave or quanta? Measurement of wavelength an momentum of electrons. Introuction Electrons isplay both wave an particle properties. What is the relationship between the

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1 Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Introduction to Markov Processes

Introduction to Markov Processes Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav

More information

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim QF101: Quantitative Finance September 5, 2017 Week 3: Derivatives Facilitator: Christopher Ting AY 2017/2018 I recoil with ismay an horror at this lamentable plague of functions which o not have erivatives.

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

Implicit Differentiation

Implicit Differentiation Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

Bohr Model of the Hydrogen Atom

Bohr Model of the Hydrogen Atom Class 2 page 1 Bohr Moel of the Hyrogen Atom The Bohr Moel of the hyrogen atom assumes that the atom consists of one electron orbiting a positively charge nucleus. Although it oes NOT o a goo job of escribing

More information

Sharp Thresholds. Zachary Hamaker. March 15, 2010

Sharp Thresholds. Zachary Hamaker. March 15, 2010 Sharp Threshols Zachary Hamaker March 15, 2010 Abstract The Kolmogorov Zero-One law states that for tail events on infinite-imensional probability spaces, the probability must be either zero or one. Behavior

More information

Switching Time Optimization in Discretized Hybrid Dynamical Systems

Switching Time Optimization in Discretized Hybrid Dynamical Systems Switching Time Optimization in Discretize Hybri Dynamical Systems Kathrin Flaßkamp, To Murphey, an Sina Ober-Blöbaum Abstract Switching time optimization (STO) arises in systems that have a finite set

More information

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

Markov Chains in Continuous Time

Markov Chains in Continuous Time Chapter 23 Markov Chains in Continuous Time Previously we looke at Markov chains, where the transitions betweenstatesoccurreatspecifietime- steps. That it, we mae time (a continuous variable) avance in

More information

Vectors in two dimensions

Vectors in two dimensions Vectors in two imensions Until now, we have been working in one imension only The main reason for this is to become familiar with the main physical ieas like Newton s secon law, without the aitional complication

More information

1 Math 285 Homework Problem List for S2016

1 Math 285 Homework Problem List for S2016 1 Math 85 Homework Problem List for S016 Note: solutions to Lawler Problems will appear after all of the Lecture Note Solutions. 1.1 Homework 1. Due Friay, April 8, 016 Look at from lecture note exercises:

More information

Qubit channels that achieve capacity with two states

Qubit channels that achieve capacity with two states Qubit channels that achieve capacity with two states Dominic W. Berry Department of Physics, The University of Queenslan, Brisbane, Queenslan 4072, Australia Receive 22 December 2004; publishe 22 March

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

Situation awareness of power system based on static voltage security region

Situation awareness of power system based on static voltage security region The 6th International Conference on Renewable Power Generation (RPG) 19 20 October 2017 Situation awareness of power system base on static voltage security region Fei Xiao, Zi-Qing Jiang, Qian Ai, Ran

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

arxiv: v4 [math.pr] 27 Jul 2016

arxiv: v4 [math.pr] 27 Jul 2016 The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,

More information

26.1 Metropolis method

26.1 Metropolis method CS880: Approximations Algorithms Scribe: Dave Anrzejewski Lecturer: Shuchi Chawla Topic: Metropolis metho, volume estimation Date: 4/26/07 The previous lecture iscusse they some of the key concepts of

More information

Non-Linear Bayesian CBRN Source Term Estimation

Non-Linear Bayesian CBRN Source Term Estimation Non-Linear Bayesian CBRN Source Term Estimation Peter Robins Hazar Assessment, Simulation an Preiction Group Dstl Porton Down, UK. probins@stl.gov.uk Paul Thomas Hazar Assessment, Simulation an Preiction

More information

Math 210 Midterm #1 Review

Math 210 Midterm #1 Review Math 20 Miterm # Review This ocument is intene to be a rough outline of what you are expecte to have learne an retaine from this course to be prepare for the first miterm. : Functions Definition: A function

More information