Mean Field Variational Approximation for Continuous-Time Bayesian Networks

Size: px
Start display at page:

Download "Mean Field Variational Approximation for Continuous-Time Bayesian Networks"

Transcription

1 Mean Fiel Variational Approximation for Continuous-Time Bayesian Networks Io Cohn Tal El-Hay Nir Frieman School of Computer Science The Hebrew University {io Raz Kupferman Institute of Mathematics The Hebrew University Abstract Continuous-time Bayesian networks is a natural structure representation language for multicomponent stochastic processes that evolve continuously over time. Despite the compact representation, inference in such moels is intractable even in relatively simple structure networks. Here we introuce a mean fiel variational approximation in which we use a prouct of inhomogeneous Markov processes to approximate a istribution over trajectories. This variational approach leas to a globally consistent istribution, which can be efficiently querie. Aitionally, it provies a lower boun on the probability of observations, thus making it attractive for learning tasks. We provie the theoretical founations for the approximation, an efficient implementation that exploits the wie range of highly optimize orinary ifferential equations (ODE) solvers, experimentally explore characterizations of processes for which this approximation is suitable, an show applications to a large-scale realworl inference problem. 1 Introuction Many real-life processes can be naturally thought of as evolving continuously in time. Examples cover a iverse range, incluing server availability, changes in socioeconomic status, an genetic sequence evolution. To realistically moel such processes, we nee to reason about systems that are compose of multiple components (e.g., many servers in a server farm, multiple resiues in a protein sequence) an evolve in continuous time. Continuous-time Bayesian networks (CTBNs) provie a representation language for such processes, which allows to naturally exploit sparse patterns of interactions to compactly represent the ynamics of such processes [9]. Inference in multi-component temporal moels is a notoriously har problem [1]. Similar to the situation in iscrete time processes, inference is exponential in the number of components, even in a CTBN with sparse interactions [9]. Thus, we have to resort to approximate inference methos. The recent literature has aapte several strategies from iscrete graphical moels to CTBNs. These inclue sampling-base approaches, where Fan an Shelton [5] introuce a likelihoo-weighte sampling scheme, an more recently we [4] introuce a Gibbs-sampling proceure. Such sampling-base approaches yiel more accurate answers with the investment of aitional computation. However, it is har to boun the require time in avance, tune the stopping criteria, or estimate the error of the approximation. An alternative class of approximations is base on variational principles. Recently, Noelman et al. [11] introuce an Expectation Propagation approach, which can be roughly escribe as a local message passing scheme, where each message escribes the ynamics of a single component over an interval. This message passing proceure can automatically refine the number of intervals accoring to the complexity of the unerlying system [14]. Nonetheless, it oes suffer from several caveats. On the formal level, the approximation has no convergence guaranties. Secon, upon convergence, the compute marginals o not necessarily form a globally consistent istribution. Thir, it is restricte to approximations in the form of piecewisehomogeneous messages on each interval. Thus, the refinement of the number of intervals epens on the fit of such homogeneous approximations to the target process. Finally, the approximation of Noelman et al oes not provie a provable approximation on the likelihoo of the observation a crucial component in learning proceures. Here, we evelop an alternative variational approximation, which provies a ifferent trae-off. We use the strategy of structure variational approximations in graphical moels [8], an specifically by the variational approach of Opper an Sanguinetti [12] for approximate inference in Markov Jump Processes, a relate class of moels (see below). The resulting proceure approximates the posterior istribution of the CTBN as a prouct of inepenent components, each of which is an inhomogeneous continuous-

2 time Markov process. As we show, by using a natural representation of these processes, we erive a variational proceure that is both efficient, an provies a goo approximation both for the likelihoo of the evience an for the expecte sufficient statistics. In particular, the approximation provies a lower-boun on the likelihoo, an thus is attractive for use in learning. 2 Continuous-Time Bayesian Networks Consier a D-component Markov process X (t) = (X (t) 1, X(t) 2,... X(t) D ) with state space S = S 1 S 2 S D. A notational convention: vectors are enote by bolface symbols, e.g., X, an matrices are enote by blackboar style characters, e.g., Q. The states in S are enote by vectors of inexes, x = (x 1,..., x D ). We use inexes 1 i, j D for enumerating components an X (t) an X (t) i to enote the ranom variable escribing the state of the process an its i th components at time t. The ynamics of a time-homogeneous continuous-time Markov process are fully etermine by the Markov transition function, p x,y (t) = Pr(X (t+s) = y X (s) = x), where time-homogeneity implies that the right-han sie oes not epen on s. These ynamics are fully capture by a matrix Q the rate matrix with non-negative offiagonal entries q x,y an iagonal q x,x = y x q x,y. This rate matrix efines the transition probabilities p x,y (h) = δ x,y + q x,y h + o(h) where δ x,y is a multivariate Kronecker elta an o( ) means ecay to zero faster than its argument. Using the rate matrix Q, we can express the Markov transition function as p x,y (t) = [exp(tq)] x,y where exp(tq) is a matrix exponential [2, 7]. A continuous-time Bayesian network is efine by assigning each component i a set of components Pa i {1,..., D} \ {i}, which are its parents in the network [9]. With each component i we then associate a set of conitional rate matrix Q i Pai u i for each state u i of Pa i. The off-iagonal entries q i Pai x i,y i u i represent the rate at which X i transitions from state x i to state y i given that its parents are in state u i. The ynamics of X (t) are efine by a rate matrix Q with entries q x,y, which amalgamates the conitional rate matrices as follows: q i Pai x i,y i u i δ(x, y) = {i} q x,y = i qi Pai x i,x i u i x = y (1) 0 otherwise, where δ(x, y) = {i x i y i }. This efinition implies that changes are one component at a time. Given a continuous-time Bayesian network, we woul like to evaluate the likelihoo of evience, to compute the probability of various events given the evience (e.g., that the state of the system at time t is x), an to compute conitional expectations (e.g., the expecte amount of time X i was in state x i ). Direct computations of these quantities involve matrix exponentials of the rate matrix Q, whose size is exponential in the number of components, making this approach infeasible beyon a moest number of components. We therefore have to resort to approximations. 3 Variational Principle for Continuous Time Markov Processes We start by efining a variational approximations principle in terms of a general continuous-time Markov process (that is, without assuming any network structure). For convenience we restrict our treatment to a time interval [0, T ] with en-point evience X (0) = e 0 an X (T ) = e T. We iscuss more general types of evience below. Here we aim to efine a lower boun on ln P Q (e T e 0 ) as well as to approximate the posterior probability P Q ( e 0, e T ). Marginal Density Representation Variational approximations cast inference as an optimization problem of a functional which approximates the log probability of the evience by introucing an auxiliary set of variational parameters. Here we efine the optimization problem over a set of mean parameters [15], representing possible values of expecte sufficient statistics. As iscusse above, the prior istribution of the process can be characterize by a time-inepenent rate matrix Q. It is easy to show that if the prior is a Markov process, then the posterior is also a Markov process, albeit not necessarily a homogeneous one. Such a process can be represente by a time-epenent rate matrix that escribes the instantaneous transition rates. Here, rather than representing the target istribution by a time-epenent rate matrix, we consier a representation that is more natural for variational approximations. Let Pr be the istribution of a Markov process. We efine a family of functions: µ x (t) = Pr(X (t) = x) Pr(X (t) = x, X (t+h) = y) γ x,y (t) = lim, x y h 0 h (2) γ x,x (t) = y x γ x,y (t). The function µ x (t) is the probability that X (t) = x. The function γ x,y (t) is the probability ensity that X transitions from state x to y at time t. Note that this parameter is not a transition rate, but rather a prouct of a pointwise probability with the point-wise transition rate of the approximating probability, i.e., γ x,y (t)/µ x (t) is the x, y entry of the time-epenent rate matrix. Hence, unlike the (inhomogeneous) rate matrix at time t, γ x,y (t) takes into account the probability of being in state x an not only the

3 rate of transitions. This efinition implies that Pr(X (t) = x, X (t+h) = y) = µ x (t)δ x,y +γ x,y (t)h+o(h), We aim to use the family of functions µ an γ as a representation of a Markov process. To o so, we nee to characterize the set of constraints that these functions shoul satisfy. Definition 3.1: A family η = {µ x (t), γ x,y (t) : 0 t T } of continuous functions is a Markov-consistent ensity set if the following constraints are fulfille: µ x (t) 0, µ x (0) = 1, γ x,y (t) 0 y x, γ x,x (t) = y x γ x,y (t), t µ x(t) = y x γ y,x (t). Let M be the set of all Markov-consistent ensities. Using stanar arguments we can show that there exists a corresponence between (generally inhomogeneous) Markov processes an ensity sets η. Specifically: Lemma 3.2: Let η = {µ x (t), γ x,y (t)}. If η M, then there exists a continuous-time Markov process P η for which µ x an γ x,y satisfy (2). The processes we are intereste in, however, have aitional structure, as they correspon to the posterior istribution of a time-homogeneous process with en-point evience. This aitional structure implies that we shoul only consier a subset of M: Lemma 3.3: Let Q be a rate matrix, an e 0, e T be states of X. Then the representation η corresponing to the posterior istribution P Q ( e 0, e T ) is in the set M e M that contains Markov-consistent ensity sets satisfying µ x (0) = δ x,e0, µ x (T ) = δ x,et. Thus, from now on we can restrict our attention to ensity sets from M e. The constraint that µ x (0) an µ x (T ) also has consequences on γ x,y at these points. Lemma 3.4: If η M e then γ x,y (0) = 0 for all x e 0 an γ x,y (T ) = 0 for all y e T. Variational Principle We can now state the variational principle for continuous processes, which closely tracks similar principles for iscrete processes. We efine a free energy functional, F(η; Q) = E(η; Q) + H(η), which, as we will see, measures the quality of η as an approximation of P Q ( e). (For succinctness, we will assume that the evience e is clear from the context.) The two terms in the continuous functional correspon to an entropy, H(η) = T 0 an an energy, E(η; Q) = γ x,y (t)[1 + ln µ x (t) ln γ x,y (t)]t, T 0 x y x µ x (t)q x,x + γ x,y (t) ln q x,y t. y x x Theorem 3.5: Let Q be a rate matrix, e = (e 0, e T ) be states of X, an η M e. Then F(η; Q) = ln P Q (e T e 0 ) ID(P η ( ) P Q ( e)) where ID(P η ( P Q ( e)) is the KL ivergence between the two processes. We conclue that F(η; Q) is a lower boun of the loglikelihoo of the evience, an that the closer the approximation to the target posterior, the tighter the boun. Proof Outline The basic iea is to consier iscrete approximations of the functional. Let K be an integer. We efine the K-sieve X K to be the set of ranom variables X (t0), X (t1),..., X (tk) where t k = kt K. We can use the variational principle [8] on the marginal istributions P Q (X K e) an P η (X K ). More precisely, efine [ F K (η; Q) = E Pη ln P ] Q(X K, e T e 0 ), P η (X K ) which can, by using simple arithmetic manipulations, be recast as F K (η; Q) = ln P Q (e T e 0 ) ID(P η (X K ) P Q (X K e)). We get the esire result by letting K. By efinition lim K ID(P η (X K ) P Q (X K e)) is ID(P η ( ) P Q ( e)). The crux of the proof is in proving the following lemma. Lemma 3.6: F(η; Q) = lim K F K (η; Q). Proof: Since both P Q an P η are Markov processes, F K (η; Q) = K 1 k=0 K 1 k=0 K 1 + k=1 ] E Pη [ln P Q (X (tk+1) X (tk) ) ] E Pη [ln P η (X (tk), X (tk+1) ) ] E Pη [ln P η (X (tk) ) We now express these terms as functions of µ x (t), γ x,y (t) an q x,y. By efinition, P η (X (t k) = x) = µ x (t k ). Each

4 of the expectations either epen on this term, or on the joint istribution P η (X (t k 1), X (t k) ). Using the continuity of γ x,y (t) we write P η (X (t k) = x, X (t k+1) = y) = δ x,y µ x (t k ) + K γ x,y (t k ) + o( K ) where K = T/K. Similarly, we can also write P Q (X (t k+1) = y X (t k) = x) = δ x,y + K q x,y +o( K ) Finally, using properties of logarithms we have that ln (1 + K z + o( K )) = K z + o( K ). Using these relations, we can rewrite after teious yet straightforwar manipulations, where E K (η; Q) = an e K (t) = x F K (η; Q) = E K (η; Q) + H K (η), where µ \i x (t) = j i µj x j (t) is the joint istribution at time t of all the components other than the i th. (It is not har to see that if η i M i e for all i, then η M e.) We efine the set M F e to contain all factore ensity sets. From now on we assume that η = η 1 η D M F e. Assuming that Q is efine by a CTBN, an that η is a factore ensity set, we can rewrite E(η; Q) = i an + i T 0 T 0 x i µ i [ ] x i (t)e µ \i (t) qxi,x i U i t γ i [ ] x i,y i (t)e µ \i (t) ln qxi,y i U i t, x i,y i x i H(η) = i H(η i ). This ecomposition involves only local terms that either K 1 K 1 inclue the i th component, or inclue the i th component K e K (t k ), H K (η) = K h K (t k ), an its parents [ in the CTBN ] efining Q. Note that terms k=0 k=0 such as E µ \i (t) qxi,x i U i involve only µ j (t) for j Pa i. To make the factore nature of the approximation explicit in the notation, we write henceforth, γ x,y (t)[1 + ln µ x (t) ln γ x,y (t)] + o( K ) y x F(η; Q) = F F (η 1,..., η D ; Q). h K (t) = x µ x (t)q xx + γ x,y (t) log q x,y + o( K ) y x Fixe Point Characterization We can now pose the optimization problem we wish to solve: Letting K we have that k k[f(t k ) + o( K )] T 0 f(t)t, hence E K(η; Q) an H K (η) converge to E(η; Q) an H(η), respectively. 4 Factore Approximation The variational principle we iscusse is base on a representation that is as complex as the original process the number of functions γ x,y (t) we consier is equal to the size of the original rate matrix Q. To get a tractable inference proceure we make aitional simplifying assumptions on the approximating istribution. Given a D-component process we consier approximations that factor into proucts of inepenent processes. More precisely, we efine M i e to be the continuous Markov-consistent ensity sets over the component X i, that are consistent with the evience on X i at times 0 an T. Given a collection of ensity sets η 1,..., η D for the ifferent components, the prouct ensity set η = η 1 η D is efine as µ x (t) = µ i x i (t) i γx i i,y i (t)µ \i x (t) δ(x, y) = {i} γ x,y (t) = i γi x i,x i (t)µ \i x (t) x = y 0 otherwise Fixing i, an given η 1,..., η i 1, η i+1,..., η D, in M 1 e,... M i 1 e, M i+1 e,..., M D e, respectively, fin arg max ηi M i F F (η 1,..., η D ; Q). e If for all i, we have a µ i M i e, which is a solution to this optimization problem with respect to each component, then we have a (local) stationary point of the energy functional within M F e. To solve this optimization problem, we efine a Lagrangian, which inclues the constraints in the form of Def The Lagrangian is a functional of the functions µ i x i (t) an γ i x i,y i (t) an Lagrange multipliers (which are functions of t as well). The stationary point of the functional satisfies the Euler-Lagrange equations, namely the functional erivatives of L vanish. Writing these equations in explicit form we get a fixe point characterization of the solution in term of the following set of ODEs: t µi x i (t) = y i x i ( γ i yi,x i (t) γ i x i,y i (t) ) t ρi x i (t) = ρ i x i (t)( q i x i,x i (t) + ψ i x i (t)) y i x i ρ i y i (t) q i x i,y i (t) where ρ i are the exponents of the Lagrange multipliers. (3)

5 In aition we have the following algebraic constraint ρ i x i (t)γ i x i,y i (t) = µ i x i (t) q i x i,y i (t)ρ i y i (t), x i y i. (4) In these equations we use the following shorthan notations for the average rates [ ] q x i i,y i (t) = E µ \i (t) q i Pai x i,y i U i [ ] q x i i,y i x j (t) = E µ \i (t) q i Pai x i,y i U i x j, Similarly, we have the following shorthan notations for the geometrically-average rates, { [ ]} q x i i,y i (t) = exp E µ \i (t) ln q i Pai x i,y i U i { [ ]} q x i i,y i x j (t) = exp E µ \i (t) ln q i Pai x i,y i U i x j, The last auxiliary term is ψx i i (t) = µ j x j (t) q j x j,x j x i (t)+ j Chilren i j Chilren i x j x j y j γ j x j,y j (t) ln q j x j,y j x i (t). The two ifferential equations (3) for µ i x i (t) an ρ i x i (t) escribe, respectively, the progression of µ i x i forwar, an the progression of ρ i x i backwar. To uniquely solve these equations we nee to set the bounary conitions. The bounary conition for µ i x i is efine explicitly in M F e as µ i x i (0) = δ xi,e i,0 (5) The bounary conition at T is slightly more involve. The constraints in M F e imply that µ i x i (T ) = δ xi,e i,t. As state by Lemma 3.4, we have that γe i i,t,x i (T ) = 0 when x i e i,t. Plugging these values into (4), an assuming that Q is irreucible we get that ρ xi (T ) = 0 for all x i e i,t. In aition, we notice that ρ ei,t (T ) 0, for otherwise the whole system of equations for ρ will collapse to 0. Finally, notice that the solution of (3) for µ i an γ i is insensitive to the multiplication of ρ i by a constant. Thus, we can arbitrarily set ρ ei,t (T ) = 1, an get the bounary conition ρ i x i (T ) = δ xi,e i,t. (6) Theorem 4.1: η i M i e is a stationary point (e.g., local maxima) of F F (η 1,..., η D ; Q) subject to the constraints of Def. 3.1 if an only if it satisfies (3 6). It is straightforwar to exten this result to show that at a maximum with respect to all the component ensities, this fixe-point characterization must hol for all components simultaneously. Example 4.2: Consier the case of a single component, for which our proceure shoul be exact, as no simplifying assumptions are mae on the ensity set. In this case, the average rates q i an the geometrically-average rates q i both reuce to the unaverage rates q, an ψ 0. Thus, the system of equations to be solve is t µ x(t) = (γ y,x (t) γ x,y (t)) y x t ρ x(t) = y along with the algebraic equation q x,y ρ y (t), ρ x (t)γ x,y (t) = q x,y µ x (t)ρ y (t), y x. In this case, it is straightforwar to show that the backwar propagation rule for ρ x implies that ρ x (t) = Pr(e T X (t) ). This system of ODEs is similar to forwar-backwar propagation, except that unlike classical forwar propagation (which woul use a function such as α x (t) = Pr(X (t) = x e 0 )), here the forwar propagation alreay takes into account the backwar messages, to irectly compute the posterior. Given this interpretation, it is clear that integrating ρ x (t) from T to 0 followe by integrating µ x (t) from 0 to T computes the exact posterior of the processes. This interpretation of ρ x (t) also allows us to unerstan the role of γ x,y (t). Recall that γ x,y (t)/µ x (t) is the instantaneous rate of transition from x to y at time t. Thus, γ x,y (t) µ x (t) = q ρ y (t) x,y ρ x (t). That is, the instantaneous rate combines the original rate with the relative likelihoo of the evience at T given y an x. If y is much more likely to lea to the final state, then the rates are biase towar y. Conversely, if y is unlikely to lea to the evience the rate of transitions to it are lower. This observation also explains why the forwar propagation of µ x will reach the observe µ x (T ) even though we i not impose it explicitly. Example 4.3: We efine an Ising chain to be a CTBN X 1 X 2 X D such that each binary component prefers to be in the same state as its neighbor. These moels are governe by two parameters: a coupling parameter β which etermines the strength of the coupling between two neighboring components, an a rate parameter τ that etermines the propensity of each component to change its state. More formally, we efine the conitional rate matrices as q i Pai x i,y i u i = τ (1 + e 2yiβ P ) 1 j Pai x j where x j { 1, 1}. As an example, we consier a two-component Ising chain with initial state X (0) 1 = 1 an X (0) 2 = 1, an a reverse state at the final time, X (T ) 1 = 1 an X (T ) 2 = 1. For a large value of β, this evience is unlikely as at

6 Figure 1: Numerical results for the two-component Ising chain escribe in Example 4.3 where the first component starts in state 1 an ens at time T = 1 in state 1. The secon component has the opposite behavior. (top) Two likely trajectories epicting the two moes of the moel. (mile) Exact (soli) an approximate (ashe/otte) marginals µ i 1(t). (bottom) The log ratio log ρ i 1(t)/ρ i 0(t). both en points the components are in a unesire configurations. The exact posterior is one that assigns higher probabilities to trajectories where one of the components switches relatively fast to match the other, an then towar the en of the interval, they separate to match the evience. Since the moel is symmetric, these trajectories are either ones in which both components are most of the time in state 1, or ones where both are most of the time in state 1 (Fig. 1 top). Due to symmetry, the marginal probability of each component is aroun 0.5 throughout most of the interval (Fig. 1 mile). The variational approximation cannot capture the epenency between the two components, an thus converges to one of two local maxima, corresponing to the two potential subsets of trajectories. Examining the value of ρ i, we see that close to the en of the interval they bias the instantaneous rates significantly (Fig. 1 bottom). This example also allows to examine the implications of moeling the posterior by inhomogeneous Markov processes. In principle, we might have use as an approximation Markov processes with homogeneous rates, an conitione on the evience. To examine whether our approximation behaves in this manner, we notice that in the single component case we have q x,y = ρ x(t)γ x,y (t) ρ y (t)µ x (t), which shoul be constant. Consier the analogous quantity in the multi-component case: q i x i,y i (t), the geometric average of the rate of X i, given the probability of parents state. Not surprisingly, this is exactly a mean fiel approximation, where the influence of interacting components is approximate by their average influence. Since the istribution of the parents (in the two-component system, the other component) changes in time, these rates change continuously, especially near the en of the time interval. This suggests that a piecewise homogeneous approximation cannot capture the ynamics without a loss in accuracy. Optimization Proceure If Q is irreucible, then ρ i x i an µ xi are non-zero throughout the open interval (0, T ). As a result, we can solve (4) to express γx i i,y i as a function of µ i an ρ i, thus eliminating it from (3) to get evolution equations solely in terms of µ i an ρ i. Abstracting the etails, we obtain a set of ODEs of the form t µi (t) = α(µ i (t), ρ i (t), µ \i (t)) t ρi (t) = β(ρ i (t), µ \i (t)) µ i (0) = given ρ i (T ) = given. where α an β can be inferre from (3) an (4). Since the evolution of ρ i oes not epen on µ i, we can integrate backwar from time T to solve for ρ i. Then, integrating forwar from time 0, we compute µ i. After performing a single iteration of backwar-forwar integration, we obtain a solution that satisfies the fixe-point equation (3) for the i th component. (This is not surprising once we have ientifie our proceure to be a variation of a stanar forwarbackwar algorithm for a single component.) Such a solution will be a local maximum of the functional w.r.t. to η i (reaching a local minimum or a sale point requires very specific initialization points). This suggests that we can use the stanar proceure of asynchronous upate, where we upate each component in a roun-robin fashion. Since each of these singlecomponent upates converges in one backwar-forwar step, an since it reaches a local maximum, each step improves the value of the free energy over the previous one. Since the free energy functional is boune by the probability of the evience, this proceure will always converge. Another issue is the initialization of this proceure. Since the iteration on the i th component epens on µ \i, we nee to initialize µ by some legal assignment. To o so, we create a fictional rate matrix Q i for each component an initialize µ i to be the posterior of the process given the evience e i,0 an e i,t. As a reasonable initial guess, we choose at ranom one of the conitional rates in Q to etermine the fictional rate matrix.

7 Figure 2: (a) Relative error as a function of the coupling parameter β (x-axis) an transition rates τ (y-axis) for an 8-component Ising chain. (b) Comparison of true vs. estimate likelihoo as a function of the rate parameter τ. (c) Comparison of true vs. likelihoo as a function of the coupling parameter β. The continuous time upate equations allow us to use stanar ODE methos with an aaptive step size (here we use the Runge-Kutta-Fehlberg (4,5) metho). At the price of some overhea, these proceure automatically tune the trae-off between error an time granularity. 5 Perspective & Relate Works Variational approximations for ifferent types of continuous-time processes have been recently propose [12, 13]. Our approach is motivate by results of Opper an Sanguinetti [12] who evelope a variational principle for a relate moel. Their moel, which they call a Markov jump process, is similar to an HMM, in which the hien chain is a continuous-time Markov process an there are (noisy) observations at iscrete points along the process. They escribe a variational principle an iscuss the form of the functional when the approximation is a prouct of inepenent processes. There are two main ifferences between the setting of Opper an Sanguinetti an ours. First, we show how to exploit the structure of the target CTBN to reuce the complexity of the approximation. These simplifications imply that the upate of the i th process epens only on its Markov blanket in the CTBN, allowing us to evelop efficient approximations for large moels. Secon, an more importantly, the structure of the evience in our setting is quite ifferent, as we assume eterministic evience at the en of intervals. This setting typically leas to a posterior Markov process in which the instantaneous rates use by Opper an Sanguinetti iverge towar the en point the rates of transition into the observe state go to infinity, leaing to numerical problems at the en points. We circumvent this problem by using the marginal ensity representation which is much more stable numerically. Taking the general perspective of Wainwright an Joran [15], the representation of the istribution uses the natural sufficient statistics. In the case of a continuous-time Markov process, the sufficient statistics are T x, the time spent in state x, an M x,y, the number of transitions from state x to y. In a iscrete-time moel, we can capture the statistics for every ranom variable. In a continuous-time moel, however, we nee to consier the time erivative of the statistics. Inee, it is not har to show that t E [T x(t)] = µ x (t) an t E [M x,y(t)] = γ x,y (t). Thus, our marginal ensity sets η provie what we consier a natural formulation for variational approaches to continuous-time Markov processes. Our presentation focuse on evience at two ens of an interval. Our formulation easily extens to eal with more elaborate types of evience: (1) If we o not observe the initial state of the i th component, we can set µ i x(0) to be the prior probability of X (0) = x. Similarly, if we o not observe X i at time T, we set ρ i x(t ) = 1 as initial ata for the backwar step. (2) In a CTBN where one (or more) components are fully observe, we simply set µ i for these components to be a istribution that assigns all the probability mass to the observe trajectory. Similarly, if we observe ifferent components at ifferent times, we may upate each component on a ifferent time interval. Consequently, maintaining for each component a marginal istribution µ i throughout the interval of interest, we can upate the other ones using their evience patterns. 6 Experimental Evaluation To gain better insight into the quality of our proceure, we performe numerical tests on moels that challenge the approximation. Specifically, we use Ising chains where we explore regimes efine by the egree of coupling between the components (the parameter β) an the rate of transitions (the parameter τ). We evaluate the error in two ways. The first is by the ifference between the true log-likelihoo an our estimate. The secon is by the average relative error in the estimate of ifferent expecte sufficient statistics efine by j ˆθ j θ j θ j where θ j is exact value of the j th ex-

8 pecte sufficient statistics an ˆθ j is the approximation. Applying our proceure on an Ising chain with 8 components, for which we can still perform exact inference, we evaluate the relative error for ifferent choices of β an τ. The evience in this experiment is e 0 = {+, +, +, +, +, +,, }, T = 0.64 an e T = {,,, +, +, +, +, +}. As shown in Fig. 2a, the error is larger when τ an β are large. In the case of a weak coupling (small β), the posterior is almost inepenent, an our approximation is accurate. In moels with few transitions (small τ), most of the mass of the posterior is concentrate on a few canonical types of trajectories that can be capture by the approximation (as in Example 4.3). At high transition rates, the components ten to transition often, an in a coorinate manner, which leas to a posterior that is har to approximate by a prouct istribution. Moreover, the resulting free energy lanscape is rough with many local maxima. Examining the error in likelihoo estimates (Fig. 2b,c) we see a similar tren. Next, we examine the run time of our approximation when using fairly stanar ODE solver with few optimizations an tunings. The run time is ominate by the time neee to perform the backwar-forwar integration when upating a single component, an by the number of such upates until convergence. Examining the run time for ifferent choices of β an τ (Fig. 3), we see that the run time of our proceure scales linearly with the number of components in the chain. Moreover, the run time is generally insensitive to the ifficulty of the problem in terms of β. It oes epen to some extent on the rate τ, suggesting that processes with more transitions require more iterations to converge. Inee, the number of iterations require to achieve convergence in the largest chains uner consieration are milly affecte by parameter choices. The scalability of the run time stans in contrast to the Gibbs sampling proceure [4], which scales roughly with the number in transitions in the sample trajectories. Comparing our metho to the Gibbs sampling proceure we see (Fig. 4) that the faster Mean Fiel approach ominates the Gibbs proceure over short run times. However, as oppose to Mean Fiel, the Gibbs proceure is asymptotically unbiase, an with longer run times it ultimately prevails. This evaluation also shows that aaptive integration proceure in our methos strikes a better trae-off than using a fixe time granularity integration. 7 Inference on Trees The abovementione experimental results inicate that our approximation is accurate when reasoning about weaklycouple components, or about time intervals involving few transitions (low transition rates). Unfortunately, in many omains we face strongly-couple components. For example, we are intereste in moeling the evolution of biological sequences (DNA, RNA, an proteins). In such systems, we have a phylogenetic tree that represents the branching Figure 3: Evaluation of the run time of the approximation versus the run time of exact inference as a function of the number of components. process that leas to current ay sequences (see Fig. 5a). It is common in sequence evolution to moel this process as a continuous-time Markov process over a tree [6]. More precisely, the evolution along each branch is a stanar continuous-time Markov process, an branching is moele by a replication, after which each replica evolves inepenently along its sub-branch. Common applications are force to assume that each character in the sequence evolves inepenently of the other. In some situations, assuming an inepenent evolution of each character is highly unreasonable. Consier the evolution of an RNA sequence that fols onto itself to form a functional structure. This foling is meiate by complementary base-pairing (A-U, C-G, etc) that stabilizes the structure. During evolution, we expect to see compensatory mutations. That is, if a A changes into C then its basepaire U will change into a G soon thereafter. To capture such coorinate changes, we nee to consier the joint evolution of the ifferent characters. In the case of RNA structure, the stability of the structure is etermine by stacking potentials that measure the stability of two ajacent pairs of interacting nucleoties. Thus, if we consier a factor network to represent the energy of a fol, it will have structure as shown in Fig. 5b. We can convert this factor graph into a CTBN using proceures that consier the energy function as a fitness criteria in evolution [3, 16]. Unfortunately, inference in such moels suffers from computational blowup, an so the few stuies that eal with it explicitly resort to sampling proceures [16]. To consier trees, we nee to exten our framework to eal with branching processes. In a linear-time moel, we view the process as a map from [0, T ] into ranom variables X (t). In the case of a tree, we view the process as a map from a point t = b, t on a tree T (efine by branch b an the time t within it) into a ranom variable X (t). Similarly, we generalize the efinition of the Markov-consistent ensity set η to inclue functions on trees. We efine continuity of functions on trees in the obvious manner. The variational approximation on trees is thus similar

9 Figure 4: Evaluation of the run time vs. accuracy trae-off for several choices of parameters for Mean Fiel an Gibbs sampling on the branching process of Fig. 5(a). Figure 5: (a) An example of a phylogenetic tree. Branch lengths enote time intervals between events with the interval use for the comparison in Fig. 6a highlighte. (b) The form of the energy function for encoing RNA foling, superimpose on a fragment of a fole structure; each gray box enotes a term that involves four nucleoties. (c) Illustration of the ODE upates on a irecte tree. Figure 6: (a) Comparison of exact vs. approximate inference along the branch from C to D in the tree of Fig. 5(a) with an without aitional evience at other leaves. Exact marginals are shown in soli lines, whereas approximate marginal are in ashe lines. The two panels show two ifferent components. (b) Evaluation of the relative error in expecte sufficient statistics for an Ising chain in branching-time; compare to Fig. 2(a). (c) Evaluation of the estimate likelihoo on a tree; compare to Fig. 2(b). to the one on intervals. Within each branch, we eal with the same upate formulas as in linear time. We enote by µ i x i (b, t) an ρ i x i (b, t) the messages compute on branch b at time t. The only changes occur at vertices. Suppose we have a branch b 1 of length T 1 incoming into vertex v, an two outgoing branches b 2 an b 3 (see Fig. 5c). Then we use the following upates for µ i x i an ρ i x i µ i x i (b k, 0) = µ i x i (b 1, T 1 ) k = 2, 3, ρ i x i (b 1, T 1 ) = ρ i x i (b 2, 0)ρ i x i (b 3, 0). The forwar propagation of µ i simply uses the value at the en of the incoming branch as initial value for the outgoing branches. In backwar propagation of ρ i the value at the en of b 1 is the prouct of the values at the start of the two outgoing branches. This is the natural operation when we recall the interpretation of ρ i as the probability of the ownstream evience given the current state. When switching to trees, we increase the amount of evience about intermeiate states. Consier for example the tree of Fig. 5a. We can view the span from C to D as an interval with evience at its en. When we a evience at the tip of other branches we gain more information about intermeiate points between C an D. To evaluate the impact of these changes on our approximation, we consiere the tree of Fig. 5a, an compare it to inference in the backbone between C an D (Fig. 2). Comparing the true marginal to the approximate one along the main backbone (see Fig. 6a) we see a major ifference in the quality of the approximation. The evience in the tree leas to a much tighter approximation of the marginal istribution. A more systematic comparison (Fig. 6b,c) emonstrates that the aitional evience reuces the magnitue of the error throughout the parameter space. As a more emaning test, we applie our inference proceure to the moel introuce by Yu an Thorne [16] for a stem of 18 interacting RNA nucleoties in 8 species in the phylogeny of Fig. 5a. We compare our estimate of the expecte sufficient statistics of this moel to these obtaine

10 our variational proceure to generate initial istribution for Gibbs sampling skip the initial burn-in phase an prouce accurate samples. Another attractive aspect of this new variational approximation is its potential use for learning moel parameters from ata. It can be easily combine with the EM proceure for CTBNs [10], to obtain a Variational-EM proceure for CTBNs, which monotonically increases the likelihoo by alternating between steps that improve the approximation η (the upates iscusse here) an steps that improve the moel parameters θ. Figure 7: Comparison of estimates of expecte sufficient statistics in the evolution of 18 interacting nucleoties, using a realistic moel of RNA evolution. Each point is an expecte statistic value; the x-axis is the estimate by the variational proceure, whereas the y-axis is the estimate by Gibbs sampling. by the Gibbs sampling proceure. The results, shown in Fig. 7, emonstrate that over all, the two approximate inference proceures are in goo agreement about the value of the expecte sufficient statistics. 8 Discussion In this paper we formulate a general variational principle for continuous-time Markov processes (by reformulating an extening the one propose by Opper an Sanguinetti [12]), an use it to erive an efficient proceure for inference in CTBNs. In this mean fiel-type approximation, we use a prouct of inepenent inhomogeneous processes to approximate the multi-component posterior. Our proceure enjoys the same benefits encountere in iscrete time mean fiel proceure [8]: it provies a lower-boun on the likelihoo of the evience an its run time scales linearly with the number of components. Using asynchronous upates it is guarantee to converge, an the approximation represents a consistent joint istribution. It also suffers from expecte shortcomings: there are multiple local maxima, an it cannot captures certain complex interactions in the posterior. By using a time-inhomogeneous representation, our approximation oes capture complex patterns in the temporal progression of the marginal istribution of each component. Importantly, the continuous time parametrization enables straightforwar implementation using stanar ODE integration packages that automatically tune the trae-off between time granularity an approximation quality. We show how to exten it to perform inference on phylogenetic trees, an show that it provies fairly accurate answers in the context of a real application. One of the key evelopments here is the shift from (piecewise) homogeneous parametric representations to continuously inhomogeneous representations base on marginal ensity sets. This shift increases the flexibility of the approximation an, somewhat surprisingly, also significantly simplifies the resulting formulation. A possible extension of the ieas set here is to use Acknowlegments We thank the anonymous reviewers for helpful remarks on previous versions of the manuscript. This research was supporte in part by a grant from the Israel Science Founation. Tal El-Hay is supporte by the Eshkol fellowship from the Israeli Ministry of Science. References [1] X. Boyen an D. Koller. Tractable inference for complex stochastic processes. In UAI, [2] K.L. Chung. Markov chains with stationary transition probabilities [3] T. El-Hay, N. Frieman, D. Koller, an R. Kupferman. Continuous time markov networks. In UAI, [4] T. El-Hay, N. Frieman, an R. Kupferman. Gibbs sampling in factorize continuous-time markov processes. In UAI, [5] Y. Fan an C.R. Shelton. Sampling for approximate inference in continuous time Bayesian networks. In AI an Math, [6] J. Felsenstein. Inferring Phylogenies [7] C.W. Gariner. Hanbook of stochastic methos [8] M. I. Joran, Z. Ghahramani, T. Jaakkola, an L. K. Saul. An introuction to variational approximations methos for graphical moels. In Learning in Graphical Moels [9] U. Noelman, C.R. Shelton, an D. Koller. Continuous time Bayesian networks. In UAI, [10] U. Noelman, C.R. Shelton, an D. Koller. Expectation maximization an complex uration istributions for continuous time Bayesian networks. In UAI, [11] U. Noelman, C.R. Shelton, an D. Koller. Expectation propagation for continuous time Bayesian networks. In UAI, [12] M. Opper an G. Sanguinetti. Variational inference for Markov jump processes. In NIPS, [13] C. Archambeau, M. Opper, Y. Shen, D. Cornfor an J. Shawe-Taylor. Variational inference for Diffusion Processes. In NIPS, [14] S. Saria, U. Noelman, an D. Koller. Reasoning at the right time granularity. In UAI, [15] M. J. Wainwright an M. Joran. Graphical moels, exponential families, an variational inference. Foun. Trens Mach. Learn., 1:1 305, [16] J. Yu an J. L Thorne. Depenence among sites in RNA evolution. Mol. Biol. Evol., 23: , 2006.

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

Switching Time Optimization in Discretized Hybrid Dynamical Systems

Switching Time Optimization in Discretized Hybrid Dynamical Systems Switching Time Optimization in Discretize Hybri Dynamical Systems Kathrin Flaßkamp, To Murphey, an Sina Ober-Blöbaum Abstract Switching time optimization (STO) arises in systems that have a finite set

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

The Principle of Least Action

The Principle of Least Action Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of

More information

Markov Chains in Continuous Time

Markov Chains in Continuous Time Chapter 23 Markov Chains in Continuous Time Previously we looke at Markov chains, where the transitions betweenstatesoccurreatspecifietime- steps. That it, we mae time (a continuous variable) avance in

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Sturm-Liouville Theory

Sturm-Liouville Theory LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory

More information

Quantum Mechanics in Three Dimensions

Quantum Mechanics in Three Dimensions Physics 342 Lecture 20 Quantum Mechanics in Three Dimensions Lecture 20 Physics 342 Quantum Mechanics I Monay, March 24th, 2008 We begin our spherical solutions with the simplest possible case zero potential.

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

Introduction to Markov Processes

Introduction to Markov Processes Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

Lie symmetry and Mei conservation law of continuum system

Lie symmetry and Mei conservation law of continuum system Chin. Phys. B Vol. 20, No. 2 20 020 Lie symmetry an Mei conservation law of continuum system Shi Shen-Yang an Fu Jing-Li Department of Physics, Zhejiang Sci-Tech University, Hangzhou 3008, China Receive

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling Case Stuy 5: Mixe Membership Moeling LDA Collapse Gibbs Sampler, VariaNonal Inference Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 Task : Mixe

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1 Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of

More information

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods Hyperbolic Moment Equations Using Quarature-Base Projection Methos J. Koellermeier an M. Torrilhon Department of Mathematics, RWTH Aachen University, Aachen, Germany Abstract. Kinetic equations like the

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

A. Exclusive KL View of the MLE

A. Exclusive KL View of the MLE A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Make graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides

Make graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides Reference 1: Transformations of Graphs an En Behavior of Polynomial Graphs Transformations of graphs aitive constant constant on the outsie g(x) = + c Make graph of g by aing c to the y-values on the graph

More information

II. First variation of functionals

II. First variation of functionals II. First variation of functionals The erivative of a function being zero is a necessary conition for the etremum of that function in orinary calculus. Let us now tackle the question of the equivalent

More information

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:

More information

05 The Continuum Limit and the Wave Equation

05 The Continuum Limit and the Wave Equation Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Polynomial Inclusion Functions

Polynomial Inclusion Functions Polynomial Inclusion Functions E. e Weert, E. van Kampen, Q. P. Chu, an J. A. Muler Delft University of Technology, Faculty of Aerospace Engineering, Control an Simulation Division E.eWeert@TUDelft.nl

More information

Introduction to variational calculus: Lecture notes 1

Introduction to variational calculus: Lecture notes 1 October 10, 2006 Introuction to variational calculus: Lecture notes 1 Ewin Langmann Mathematical Physics, KTH Physics, AlbaNova, SE-106 91 Stockholm, Sween Abstract I give an informal summary of variational

More information

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example

More information

Designing Information Devices and Systems II Fall 2017 Note Theorem: Existence and Uniqueness of Solutions to Differential Equations

Designing Information Devices and Systems II Fall 2017 Note Theorem: Existence and Uniqueness of Solutions to Differential Equations EECS 6B Designing Information Devices an Systems II Fall 07 Note 3 Secon Orer Differential Equations Secon orer ifferential equations appear everywhere in the real worl. In this note, we will walk through

More information

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13)

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13) Slie10 Haykin Chapter 14: Neuroynamics (3r E. Chapter 13) CPSC 636-600 Instructor: Yoonsuck Choe Spring 2012 Neural Networks with Temporal Behavior Inclusion of feeback gives temporal characteristics to

More information

Permanent vs. Determinant

Permanent vs. Determinant Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an

More information

Optimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations

Optimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations Optimize Schwarz Methos with the Yin-Yang Gri for Shallow Water Equations Abessama Qaouri Recherche en prévision numérique, Atmospheric Science an Technology Directorate, Environment Canaa, Dorval, Québec,

More information

2Algebraic ONLINE PAGE PROOFS. foundations

2Algebraic ONLINE PAGE PROOFS. foundations Algebraic founations. Kick off with CAS. Algebraic skills.3 Pascal s triangle an binomial expansions.4 The binomial theorem.5 Sets of real numbers.6 Surs.7 Review . Kick off with CAS Playing lotto Using

More information

Lagrangian and Hamiltonian Mechanics

Lagrangian and Hamiltonian Mechanics Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical

More information

θ x = f ( x,t) could be written as

θ x = f ( x,t) could be written as 9. Higher orer PDEs as systems of first-orer PDEs. Hyperbolic systems. For PDEs, as for ODEs, we may reuce the orer by efining new epenent variables. For example, in the case of the wave equation, (1)

More information

Perturbation Analysis and Optimization of Stochastic Flow Networks

Perturbation Analysis and Optimization of Stochastic Flow Networks IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. XX, NO. Y, MMM 2004 1 Perturbation Analysis an Optimization of Stochastic Flow Networks Gang Sun, Christos G. Cassanras, Yorai Wari, Christos G. Panayiotou,

More information

Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing

Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing Course Project for CDS 05 - Geometric Mechanics John M. Carson III California Institute of Technology June

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Cascaded redundancy reduction

Cascaded redundancy reduction Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,

More information

A. Incorrect! The letter t does not appear in the expression of the given integral

A. Incorrect! The letter t does not appear in the expression of the given integral AP Physics C - Problem Drill 1: The Funamental Theorem of Calculus Question No. 1 of 1 Instruction: (1) Rea the problem statement an answer choices carefully () Work the problems on paper as neee (3) Question

More information

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

Final Exam Study Guide and Practice Problems Solutions

Final Exam Study Guide and Practice Problems Solutions Final Exam Stuy Guie an Practice Problems Solutions Note: These problems are just some of the types of problems that might appear on the exam. However, to fully prepare for the exam, in aition to making

More information

Nonlinear Adaptive Ship Course Tracking Control Based on Backstepping and Nussbaum Gain

Nonlinear Adaptive Ship Course Tracking Control Based on Backstepping and Nussbaum Gain Nonlinear Aaptive Ship Course Tracking Control Base on Backstepping an Nussbaum Gain Jialu Du, Chen Guo Abstract A nonlinear aaptive controller combining aaptive Backstepping algorithm with Nussbaum gain

More information

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Calculus Class Notes for the Combined Calculus and Physics Course Semester I

Calculus Class Notes for the Combined Calculus and Physics Course Semester I Calculus Class Notes for the Combine Calculus an Physics Course Semester I Kelly Black December 14, 2001 Support provie by the National Science Founation - NSF-DUE-9752485 1 Section 0 2 Contents 1 Average

More information

Stable and compact finite difference schemes

Stable and compact finite difference schemes Center for Turbulence Research Annual Research Briefs 2006 2 Stable an compact finite ifference schemes By K. Mattsson, M. Svär AND M. Shoeybi. Motivation an objectives Compact secon erivatives have long

More information

A Sketch of Menshikov s Theorem

A Sketch of Menshikov s Theorem A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p

More information

All s Well That Ends Well: Supplementary Proofs

All s Well That Ends Well: Supplementary Proofs All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

Generalized Tractability for Multivariate Problems

Generalized Tractability for Multivariate Problems Generalize Tractability for Multivariate Problems Part II: Linear Tensor Prouct Problems, Linear Information, an Unrestricte Tractability Michael Gnewuch Department of Computer Science, University of Kiel,

More information

BEYOND THE CONSTRUCTION OF OPTIMAL SWITCHING SURFACES FOR AUTONOMOUS HYBRID SYSTEMS. Mauro Boccadoro Magnus Egerstedt Paolo Valigi Yorai Wardi

BEYOND THE CONSTRUCTION OF OPTIMAL SWITCHING SURFACES FOR AUTONOMOUS HYBRID SYSTEMS. Mauro Boccadoro Magnus Egerstedt Paolo Valigi Yorai Wardi BEYOND THE CONSTRUCTION OF OPTIMAL SWITCHING SURFACES FOR AUTONOMOUS HYBRID SYSTEMS Mauro Boccaoro Magnus Egerstet Paolo Valigi Yorai Wari {boccaoro,valigi}@iei.unipg.it Dipartimento i Ingegneria Elettronica

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

Generalization of the persistent random walk to dimensions greater than 1

Generalization of the persistent random walk to dimensions greater than 1 PHYSICAL REVIEW E VOLUME 58, NUMBER 6 DECEMBER 1998 Generalization of the persistent ranom walk to imensions greater than 1 Marián Boguñá, Josep M. Porrà, an Jaume Masoliver Departament e Física Fonamental,

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

Exponential asymptotic property of a parallel repairable system with warm standby under common-cause failure

Exponential asymptotic property of a parallel repairable system with warm standby under common-cause failure J. Math. Anal. Appl. 341 (28) 457 466 www.elsevier.com/locate/jmaa Exponential asymptotic property of a parallel repairable system with warm stanby uner common-cause failure Zifei Shen, Xiaoxiao Hu, Weifeng

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

The Press-Schechter mass function

The Press-Schechter mass function The Press-Schechter mass function To state the obvious: It is important to relate our theories to what we can observe. We have looke at linear perturbation theory, an we have consiere a simple moel for

More information

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10 Some vector algebra an the generalize chain rule Ross Bannister Data Assimilation Research Centre University of Reaing UK Last upate 10/06/10 1. Introuction an notation As we shall see in these notes the

More information

THE ACCURATE ELEMENT METHOD: A NEW PARADIGM FOR NUMERICAL SOLUTION OF ORDINARY DIFFERENTIAL EQUATIONS

THE ACCURATE ELEMENT METHOD: A NEW PARADIGM FOR NUMERICAL SOLUTION OF ORDINARY DIFFERENTIAL EQUATIONS THE PUBISHING HOUSE PROCEEDINGS O THE ROMANIAN ACADEMY, Series A, O THE ROMANIAN ACADEMY Volume, Number /, pp. 6 THE ACCURATE EEMENT METHOD: A NEW PARADIGM OR NUMERICA SOUTION O ORDINARY DIERENTIA EQUATIONS

More information

Differentiability, Computing Derivatives, Trig Review

Differentiability, Computing Derivatives, Trig Review Unit #3 : Differentiability, Computing Derivatives, Trig Review Goals: Determine when a function is ifferentiable at a point Relate the erivative graph to the the graph of an original function Compute

More information

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE Journal of Soun an Vibration (1996) 191(3), 397 414 THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE E. M. WEINSTEIN Galaxy Scientific Corporation, 2500 English Creek

More information

Dissipative numerical methods for the Hunter-Saxton equation

Dissipative numerical methods for the Hunter-Saxton equation Dissipative numerical methos for the Hunter-Saton equation Yan Xu an Chi-Wang Shu Abstract In this paper, we present further evelopment of the local iscontinuous Galerkin (LDG) metho esigne in [] an a

More information

ELEC3114 Control Systems 1

ELEC3114 Control Systems 1 ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.

More information

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation Relative Entropy an Score Function: New Information Estimation Relationships through Arbitrary Aitive Perturbation Dongning Guo Department of Electrical Engineering & Computer Science Northwestern University

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

Differentiability, Computing Derivatives, Trig Review. Goals:

Differentiability, Computing Derivatives, Trig Review. Goals: Secants vs. Derivatives - Unit #3 : Goals: Differentiability, Computing Derivatives, Trig Review Determine when a function is ifferentiable at a point Relate the erivative graph to the the graph of an

More information

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7. Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency

More information

Fractional Geometric Calculus: Toward A Unified Mathematical Language for Physics and Engineering

Fractional Geometric Calculus: Toward A Unified Mathematical Language for Physics and Engineering Fractional Geometric Calculus: Towar A Unifie Mathematical Language for Physics an Engineering Xiong Wang Center of Chaos an Complex Network, Department of Electronic Engineering, City University of Hong

More information

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers International Journal of Statistics an Probability; Vol 6, No 5; September 207 ISSN 927-7032 E-ISSN 927-7040 Publishe by Canaian Center of Science an Eucation Improving Estimation Accuracy in Nonranomize

More information

Range and speed of rotor walks on trees

Range and speed of rotor walks on trees Range an spee of rotor wals on trees Wilfrie Huss an Ecaterina Sava-Huss May 15, 1 Abstract We prove a law of large numbers for the range of rotor wals with ranom initial configuration on regular trees

More information