A Unified Approach for Learning the Parameters of Sum-Product Networks

Size: px
Start display at page:

Download "A Unified Approach for Learning the Parameters of Sum-Product Networks"

Transcription

1 A Unifie Approach for Learning the Parameters of Sum-Prouct Networks Han Zhao Machine Learning Dept. Carnegie Mellon University Pascal Poupart School of Computer Science University of Waterloo Geoff Goron Machine Learning Dept. Carnegie Mellon University Abstract We present a unifie approach for learning the parameters of Sum-Prouct networks (SPNs). We prove that any complete an ecomposable SPN is equivalent to a mixture of trees where each tree correspons to a prouct of univariate istributions. Base on the mixture moel perspective, we characterize the objective function when learning SPNs base on the maximum likelihoo estimation (MLE) principle an show that the optimization problem can be formulate as a signomial program. We construct two parameter learning algorithms for SPNs by using sequential monomial approximations (SMA) an the concave-convex proceure (CCCP), respectively. The two propose methos naturally amit multiplicative upates, hence effectively avoiing the projection operation. With the help of the unifie framework, we also show that, in the case of SPNs, CCCP leas to the same algorithm as Expectation Maximization (EM) espite the fact that they are ifferent in general. 1 Introuction Sum-prouct networks (SPNs) are new eep graphical moel architectures that amit exact probabilistic inference in linear time in the size of the network [14]. Similar to traitional graphical moels, there are two main problems when learning SPNs: structure learning an parameter learning. Parameter learning is interesting even if we know the groun truth structure ahea of time; structure learning epens on parameter learning, so better parameter learning can often lea to better structure learning. Poon an Domingos [14] an Gens an Domingos [6] propose both generative an iscriminative learning algorithms for parameters in SPNs. At a high level, these approaches view SPNs as eep architectures an apply projecte graient escent (PGD) to optimize the ata log-likelihoo. There are several rawbacks associate with PGD. For example, the projection step in PGD hurts the convergence of the algorithm an it will often lea to solutions on the bounary of the feasible region. Also, PGD contains an aitional arbitrary parameter, the projection margin, which can be har to set well in practice. In [14, 6], the authors also mentione the possibility of applying EM algorithms to train SPNs by viewing sum noes in SPNs as hien variables. They presente an EM upate formula without etails. However, the upate formula for EM given in [14, 6] is incorrect, as first pointe out an correcte by [12]. In this paper we take a ifferent perspective an present a unifie framework, which treats [14, 6] as special cases, for learning the parameters of SPNs. We prove that any complete an ecomposable SPN is equivalent to a mixture of trees where each tree correspons to a prouct of univariate istributions. Base on the mixture moel perspective, we can precisely characterize the functional form of the objective function base on the network structure. We show that the optimization problem associate with learning the parameters of SPNs base on the MLE principle can be formulate as a signomial program (SP), where both PGD an exponentiate graient (EG) can be viewe as first orer approximations of the signomial program after suitable transformations of the objective 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.

2 function. We also show that the signomial program formulation can be equivalently transforme into a ifference of convex functions (DCP) formulation, where the objective function of the program can be naturally expresse as a ifference of two convex functions. The DCP formulation allows us to evelop two efficient optimization algorithms for learning the parameters of SPNs base on sequential monomial approximations (SMA) an the concave-convex proceure (CCCP), respectively. Both propose approaches naturally amit multiplicative upates, hence effectively eal with the positivity constraints of the optimization. Furthermore, uner our unifie framework, we also show that CCCP leas to the same algorithm as EM espite that these two approaches are ifferent from each other in general. Although we mainly focus on MLE base parameter learning, the mixture moel interpretation of SPN also helps to evelop a Bayesian learning metho for SPNs [21]. PGD, EG, SMA an CCCP can all be viewe as ifferent levels of convex relaxation of the original SP. Hence the framework also provies an intuitive way to compare all four approaches. We conuct extensive experiments on 20 benchmark ata sets to compare the empirical performance of PGD, EG, SMA an CCCP. Experimental results valiate our theoretical analysis that CCCP is the best among all 4 approaches, showing that it converges consistently faster an with more stability than the other three methos. Furthermore, we use CCCP to boost the performance of LearnSPN [7], showing that it can achieve results comparable to state-of-the-art structure learning algorithms using SPNs with much smaller network sizes. 2 Backgroun 2.1 Sum-Prouct Networks To simplify the iscussion of the main iea of our unifie framework, we focus our attention on SPNs over Boolean ranom variables. However, the framework presente here is general an can be easily extene to other iscrete an continuous ranom variables. We first efine the notion of network polynomial. We use I x to enote an inicator variable that returns 1 when X = x an 0 otherwise. Definition 1 (Network Polynomial [4]). Let f( ) 0 be an unnormalize probability istribution over a Boolean ranom vector X 1:N. The network polynomial of f( ) is a multilinear function Px f(x) Q N n=1 I x n of inicator variables, where the summation is over all possible instantiations of the Boolean ranom vector X 1:N. A Sum-Prouct Network (SPN) over Boolean variables X 1:N is a roote DAG that computes the network polynomial over X 1:N. The leaves are univariate inicators of Boolean variables an internal noes are either sum or prouct. Each sum noe computes a weighte sum of its chilren an each prouct noe computes the prouct of its chilren. The scope of a noe in an SPN is efine as the set of variables that have inicators among the noe s escenants. For any noe v in an SPN, if v is a terminal noe, say, an inicator variable over X, then scope(v) ={X}, else scope(v) = S ṽ2ch(v) scope(ṽ). An SPN is complete iff each sum noe has chilren with the same scope. An SPN is ecomposable iff for every prouct noe v, scope(v i ) T scope(v j ) =? where v i,v j 2 Ch(v),i6= j. The scope of the root noe is {X 1,...,X N }. In this paper, we focus on complete an ecomposable SPNs. For a complete an ecomposable SPN S, each noe v in S efines a network polynomial f v ( ) which correspons to the sub-spn (subgraph) roote at v. The network polynomial of S, enote by f S, is the network polynomial efine by the root of S, which can be compute recursively from its chilren. The probability f S(x) Px f S(x) istribution inuce by an SPN S is efine as Pr S (x),. The normalization constant P Px f S(x) can be compute in O( S ) in SPNs by setting the values of all the leaf noes to be 1, i.e., x f S(x) =f S (1) [14]. This leas to efficient joint/marginal/conitional inference in SPNs. 2.2 Signomial Programming (SP) Before introucing SP, we first introuce geometric programming (GP), which is a strict subclass of SP. A monomial is efine as a function h : R n ++ 7! R: h(x) =x a1 1 xa2 2 xan n, where the omain is restricte to be the positive orthant (R n ++), the coefficient is positive an the exponents a i 2 R, 8i. Aposynomial is a sum of monomials: g(x) = P K k=1 kx a 1k 1 x a 2k 2 x a nk n. One of the key properties of posynomials is positivity, which allows us to transform any posynomial into the log 2

3 omain. A GP in stanar form is efine to be an optimization problem where both the objective function an the inequality constraints are posynomials an the equality constraints are monomials. There is also an implicit constraint that x 2 R n ++. A GP in its stanar form is not a convex program since posynomials are not convex functions in general. However, we can effectively transform it into a convex problem by using the logarithmic transformation trick on x, the multiplicative coefficients of each monomial an also each objective/constraint function [3, 1]. An SP has the same form as GP except that the multiplicative constant insie each monomial is not restricte to be positive, i.e., can take any real value. Although the ifference seems to be small, there is a huge ifference between GP an SP from the computational perspective. The negative multiplicative constant in monomials invaliates the logarithmic transformation trick frequently use in GP. As a result, SPs cannot be reuce to convex programs an are believe to be har to solve in general [1]. 3 Unifie Approach for Learning In this section we will show that the parameter learning problem of SPNs base on the MLE principle can be formulate as an SP. We will use a sequence of optimal monomial approximations combine with backtracking line search an the concave-convex proceure to tackle the SP. Due to space constraints, we refer intereste reaers to the supplementary material for all the proof etails. 3.1 Sum-Prouct Networks as a Mixture of Trees We introuce the notion of inuce trees from SPNs an use it to show that every complete an ecomposable SPN can be interprete as a mixture of inuce trees, where each inuce tree correspons to a prouct of univariate istributions. From this perspective, an SPN can be unerstoo as a huge mixture moel where the effective number of components in the mixture is etermine by its network structure. The metho we escribe here is not the first metho for interpreting an SPN (or the relate arithmetic circuit) as a mixture istribution [20, 5, 2]; but, the new metho can result in an exponentially smaller mixture, see the en of this section for more etails. Definition 2 (Inuce SPN). Given a complete an ecomposable SPN S over X 1:N, let T = (T V, T E ) be a subgraph of S. T is calle an inuce SPN from S if 1. Root(S) 2T V. 2. If v 2T V is a sum noe, then exactly one chil of v in S is in T V, an the corresponing ege is in T E. 3. If v 2T V is a prouct noe, then all the chilren of v in S are in T V, an the corresponing eges are in T E. Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. As a result of Thm. 1, we will use the terms inuce SPNs an inuce trees interchangeably. With some abuse of notation, we use T (x) to mean the value of the network polynomial of T with input vector x. Theorem 2. If T is an inuce tree from S over X 1:N, then T (x) = Q (v i,v j)2t E w ij Q N n=1 I x n, where w ij is the ege weight of (v i,v j ) if v i is a sum noe an w ij =1if v i is a prouct noe. Remark. Although we focus our attention on Boolean ranom variables for the simplicity of iscussion an illustration, Thm. 2 can be extene to the case where the univariate istributions at the leaf noes are continuous or iscrete istributions with countably infinitely many values, e.g., Gaussian istributions or Poisson istributions. We can simply replace the prouct of univariate istributions term, Q N n=1 I x n, in Thm. 2 to be the general form Q N n=1 p n(x n ), where p n (X n ) is a univariate istribution over X n. Also note that it is possible for two unique inuce trees to share the same prouct of univariate istributions, but in this case their weight terms Q (v i,v i)2t E w ij are guarantee to be ifferent. As we will see shortly, Thm. 2 implies that the joint istribution over {X n } N n=1 represente by an SPN is essentially a mixture moel with potentially exponentially many components in the mixture. 3

4 Definition 3 (Network carinality). The network carinality S of an SPN S is the number of unique inuce trees. Theorem 3. S = f S (1 1), where f S (1 1) is the value of the network polynomial of S with input vector 1 an all ege weights set to be 1. Theorem 4. S(x) = P S t=1 T t(x), where T t is the tth unique inuce tree of S. Remark. The above four theorems prove the fact that an SPN S is an ensemble or mixture of trees, where each tree computes an unnormalize istribution over X 1:N. The total number of unique trees in S is the network carinality S, which only epens on the structure of S. Each component is a simple prouct of univariate istributions. We illustrate the theorems above with a simple example in Fig. 1. w1 w3 w2 = w 1 +w 2 +w X1 X1 X2 X2 X1 X2 X1 X2 X1 X2 Figure 1: A complete an ecomposable SPN is a mixture of inuce trees. Double circles inicate univariate istributions over X 1 an X 2. Different colors are use to highlight unique inuce trees; each inuce tree is a prouct of univariate istributions over X 1 an X 2. Zhao et al. [20] show that every complete an ecomposable SPN is equivalent to a bipartite Bayesian network with a layer of hien variables an a layer of observable ranom variables. The number of hien variables in the bipartite Bayesian network is equal to the number of sum noes in S. A naive expansion of such Bayesian network to a mixture moel will lea to a huge mixture moel with 2 O(M) components, where M is the number of sum noes in S. Here we complement their theory an show that each complete an ecomposable SPN is essentially a mixture of trees an the effective number of unique inuce trees is given by S. Note that S = f S (1 1) epens only on the network structure, an can often be much smaller than 2 O(M). Without loss of generality, assuming that in S layers of sum noes are alternating with layers of prouct noes, then f S (1 1) = (2 h ), where h is the height of S. However, the exponentially many trees are recursively merge an combine in S such that the overall network size is still tractable. 3.2 Maximum Likelihoo Estimation as SP Let s consier the likelihoo function compute by an SPN S over N binary ranom variables with moel parameters w an input vector x 2{0, 1} N. Here the moel parameters in S are ege weights from every sum noe, an we collect them together into a long vector w 2 R D ++, where D correspons to the number of eges emanating from sum noes in S. By efinition, the probability istribution inuce by S can be compute by Pr S (x w), P f S(x w) x f S(x w) = f S(x w) f S (1 w). Corollary 5. Let S be an SPN with weights w 2 R D ++ over input vector x 2{0, 1} N, the network polynomial f S (x w) is a posynomial: f S (x w) = P f S (1 1) Q N Q D t=1 n=1 I(t) x n =1 wi w 2T t, where I w 2T t is the inicator variable whether w is in the t-th inuce tree T t or not. Each monomial correspons exactly to a unique inuce tree SPN from S. The above statement is a irect corollary of Thm. 2, Thm. 3 an Thm. 4. From the efinition of network polynomial, we know that f S is a multilinear function of the inicator variables. Corollary 5 works as a complement to characterize the functional form of a network polynomial in terms of w. It follows that the likelihoo function L S (w), Pr S (x w) can be expresse as the ratio of two posynomial functions. We now show that the optimization problem base on MLE is an SP. Using the efinition of Pr(x w) an Corollary 5, let = f S (1 1), the MLE problem can be rewritten as maximize w f S (x w) f S (1 w) = subject to w 2 R D ++ P t=1 Q N n=1 I(t) x n Q D =1 wi w 2T t P t=1 Q D =1 wi w 2T t (1) 4

5 Proposition 6. The MLE problem for SPNs is a signomial program. Being nonconvex in general, SP is essentially har to solve from a computational perspective [1, 3]. However, espite the harness of SP in general, the objective function in the MLE formulation of SPNs has a special structure, i.e., it is the ratio of two posynomials, which makes the esign of efficient optimization algorithms possible. 3.3 Difference of Convex Functions Both PGD an EG are first-orer methos an they can be viewe as approximating the SP after applying a logarithmic transformation to the objective function only. Although (1) is a signomial program, its objective function is expresse as the ratio of two posynomials. Hence, we can still apply the logarithmic transformation trick use in geometric programming to its objective function an to the variables to be optimize. More concretely, let w =exp(y ), 8 an take the log of the objective function; it becomes equivalent to maximize the following new objective without any constraint on y: 0 (x) X DX!1!! X DX maximize exp y I y 2T t A log exp y I y 2T t (2) t=1 =1 Note that in the first term of Eq. 2 the upper inex (x) apple, f S (1 1) epens on the current input x. By transforming into the log-space, we naturally guarantee the positivity of the solution at each iteration, hence transforming a constraine optimization problem into an unconstraine optimization problem without any sacrifice. Both terms in Eq. 2 are convex functions in y after the transformation. Hence, the transforme objective function is now expresse as the ifference of two convex functions, which is calle a DC function [9]. This helps us to esign two efficient algorithms to solve the problem base on the general iea of sequential convex approximations for nonlinear programming Sequential Monomial Approximation Let s consier the linearization of both terms in Eq. 2 in orer to apply first-orer methos in the transforme space. To compute the graient with respect to ifferent components of y, we view each noe of an SPN as an intermeiate function of the network polynomial an apply the chain rule to back-propagate the graient. The ifferentiation of f S (x w) with respect to the root noe of the network is set to be 1. The ifferentiation of the network polynomial with respect to a partial function at each noe can then be compute in two passes of the network: the bottom-up pass evaluates the values of all partial functions given the current input x an the top-own pass ifferentiates the network polynomial with respect to each partial function. Following the evaluation-ifferentiation passes, the graient of the objective function in (2) can be compute in O( S ). Furthermore, although the computation is conucte in y, the results are fully expresse in terms of w, which suggests that in practice we o not nee to explicitly construct y from w. Let f(y) =logf S (x exp(y)) log f S (1 exp(y)). It follows that approximating f(y) with the best linear function is equivalent to using the best monomial approximation of the signomial program (1). This leas to a sequential monomial approximations of the original SP formulation: at each iteration y (k), we linearize both terms in Eq. 2 an form the optimal monomial function in terms of w (k). The aitive upate of y (k) leas to a multiplicative upate of w (k) since w (k) =exp(y (k) ), an we use a backtracking line search to etermine the step size of the upate in each iteration Concave-convex Proceure Sequential monomial approximation fails to use the structure of the problem when learning SPNs. Here we propose another approach base on the concave-convex proceure (CCCP) [18] to use the fact that the objective function is expresse as the ifference of two convex functions. At a high level CCCP solves a sequence of concave surrogate optimizations until convergence. In many cases, the maximum of a concave surrogate function can only be solve using other convex solvers an as a result the efficiency of the CCCP highly epens on the choice of the convex solvers. However, we show that by a suitable transformation of the network we can compute the maximum of the concave surrogate in close form in time that is linear in the network size, which leas to a very efficient t=1 =1 5

6 algorithm for learning the parameters of SPNs. We also prove the convergence properties of our algorithm. Consier the objective function to be maximize in DCP: f(y) = log f S (x exp(y)) log f S (1 exp(y)), f 1 (y) +f 2 (y) where f 1 (y), log f S (x exp(y)) is a convex function an f 2 (y), log f S (1 exp(y)) is a concave function. We can linearize only the convex part f 1 (y) to obtain a surrogate function ˆf(y, z) =f 1 (z)+r z f 1 (z) T (y z)+f 2 (y) (3) for 8y, z 2 R D. Now ˆf(y, z) is a concave function in y. Due to the convexity of f 1 (y) we have f 1 (y) f 1 (z)+r z f 1 (z) T (y z), 8y, z an as a result the following two properties always hol for 8y, z: ˆf(y, z) apple f(y) an ˆf(y, y) =f(y). CCCP upates y at each iteration k by solving y (k) 2 arg max y ˆf(y, y (k 1) ) unless we alreay have y (k 1) 2 arg max y ˆf(y, y (k 1) ), in which case a generalize fixe point y (k 1) has been foun an the algorithm stops. It is easy to show that at each iteration of CCCP we always have f(y (k) ) f(y (k 1) ). Note also that f(y) is computing the log-likelihoo of input x an therefore it is boune above by 0. By the monotone convergence theorem, lim k!1 f(y (k) ) exists an the sequence {f(y (k) )} converges. We now iscuss how to compute a close form solution for the maximization of the concave surrogate ˆf(y, y (k 1) ). Since ˆf(y, y (k 1) ) is ifferentiable an concave for any fixe y (k 1), a sufficient an necessary conition to fin its maximum is r y ˆf(y, y (k 1) )=r y (k 1)f 1 (y (k 1) )+r y f 2 (y) =0 (4) In the above equation, if we consier only the partial erivative with respect to y ij (w ij ), we obtain (k 1) w ij f vj (x w (k 1) S (x w (k 1) ) f S (x w (k 1) vi (x w (k 1) ) = w ijf vj (1 S (1 w) f S (1 vi (1 w) Eq. 5 leas to a system of D nonlinear equations, which is har to solve in close form. However, if P we o a change of variable by consiering locally normalize weights wij 0 (i.e., w0 ij 0 an j w0 ij =18i), then a solution can be easily compute. As escribe in [13, 20], any SPN can be transforme into an equivalent normal SPN with locally normalize weights in a bottom up pass as follows: wij 0 = w ijf vj (1 w) P j w (6) ijf vj (1 w) We can then replace w ij f vj (1 w) in the above equation by the expression it is equal to in Eq. 5 to obtain a close form solution: w 0 ij / w (k 1) ij f vj (x w (k 1) S (x w (k 1) ) f S (x w (k 1) vi (x w (k 1) ) Note that in the above erivation both f vi (1 w)/f S (1 w) S (1 w)/@f vi (1 w) can be treate as constants an hence absorbe since wij 0, 8j are constraine to be locally normalize. In orer to obtain a solution to Eq. 5, for each ege weight w ij, the sufficient statistics inclue only three terms, (k 1) i.e, the evaluation value at v j, the ifferentiation value at v i an the previous ege weight w ij, all of which can be obtaine in two passes of the network for each input x. Thus the computational complexity to obtain a maximum of the concave surrogate is O( S ). Interestingly, Eq. 7 leas to the same upate formula as in the EM algorithm [12] espite the fact that CCCP an EM start from ifferent perspectives. We show that all the limit points of the sequence {w (k) } 1 k=1 are guarantee to be stationary points of DCP in (2). Theorem 7. Let {w (k) } 1 k=1 be any sequence generate using Eq. 7 from any positive initial point, then all the limiting points of {w (k) } 1 k=1 are stationary points of the DCP in (2). In aition, lim k!1 f(y (k) )=f(y ), where y is some stationary point of (2). We summarize all four algorithms an highlight their connections an ifferences in Table 1. Although we mainly iscuss the batch version of those algorithms, all of the four algorithms can be easily aapte to work in stochastic an/or parallel settings. (5) (7) 6

7 Table 1: Summary of PGD, EG, SMA an CCCP. Var. means the optimization variables. Algo Var. Upate Type Upate Formula n o PGD w Aitive w (k+1) P R ++ w (k) + (r w f 1(w (k) ) r w f 2(w (k) )) EG w Multiplicative w (k+1) w (k) exp{ (r w f 1(w (k) ) r w f 2(w (k) ))} SMA log w Multiplicative w (k+1) w (k) exp{ w (k) (r w f 1(w (k) ) r w f 2(w (k) ))} CCCP log w Multiplicative w (k+1) ij / w (k) ij r vi f S(w (k) ) f vj (w (k) ) 4 Experiments 4.1 Experimental Setting We conuct experiments on 20 benchmark ata sets from various omains to compare an evaluate the convergence performance of the four algorithms: PGD, EG, SMA an CCCP (EM). These 20 ata sets are wiely use in [7, 15] to assess ifferent SPNs for the task of ensity estimation. All the features in the 20 ata sets are binary features. All the SPNs that are use for comparisons of PGD, EG, SMA an CCCP are traine using LearnSPN [7]. We iscar the weights returne by LearnSPN an use ranom weights as initial moel parameters. The ranom weights are etermine by the same ranom see in all four algorithms. Detaile information about these 20 atasets an the SPNs use in the experiments are provie in the supplementary material. 4.2 Parameter Learning We implement all four algorithms in C++. For each algorithm, we set the maximum number of iterations to 50. If the absolute ifference in the training log-likelihoo at two consecutive steps is less than 0.001, the algorithms are stoppe. For PGD, EG an SMA, we combine each of them with backtracking line search an use a weight shrinking coefficient set at 0.8. The learning rates are initialize to 1.0 for all three methos. For PGD, we set the projection margin to There is no learning rate an no backtracking line search in CCCP. We set the smoothing parameter to in CCCP to avoi numerical issues. We show in Fig. 2 the average log-likelihoo scores on 20 training ata sets to evaluate the convergence spee an stability of PGD, EG, SMA an CCCP. Clearly, CCCP wins by a large margin over PGD, EG an SMA, both in convergence spee an solution quality. Furthermore, among the four algorithms, CCCP is the most stable one ue to its guarantee that the log-likelihoo (on training ata) will not ecrease after each iteration. As shown in Fig. 2, the training curves of CCCP are more smooth than the other three methos in almost all the cases. These 20 experiments also clearly show that CCCP often converges in a few iterations. On the other han, PGD, EG an SMA are on par with each other since they are all first-orer methos. SMA is more stable than PGD an EG an often achieves better solutions than PGD an EG. On large ata sets, SMA also converges faster than PGD an EG. Surprisingly, EG performs worse than PGD in some cases an is quite unstable espite the fact that it amits multiplicative upates. The hook shape curves of PGD in some ata sets, e.g. Kosarak an KDD, are ue to the projection operations. Table 2: Average log-likelihoos on test ata. Highest log-likelihoos are highlighte in bol. " shows statistically better log-likelihoos than CCCP an # shows statistically worse log-likelihoos than CCCP. The significance is measure base on the Wilcoxon signe-rank test. Data set CCCP LearnSPN ID-SPN Data set CCCP LearnSPN ID-SPN NLTCS # # DNA # " MSNBC # Kosarak # KDD 2k # # MSWeb # Plants # " Book # " Auio # EachMovie # " Jester # # WebKB # " Netflix # " Reuters # " Accients # " Newsgrp # " Retail # BBC # # Pumsb-star # " A # #

8 Figure 2: Negative log-likelihoo values versus number of iterations for PGD, EG, SMA an CCCP. The computational complexity per upate is O( S ) in all four algorithms. CCCP often takes less time than the other three algorithms because it takes fewer iterations to converge. We list etaile running time statistics for all four algorithms on the 20 ata sets in the supplementary material. 4.3 Fine Tuning We combine CCCP as a fine tuning proceure with the structure learning algorithm LearnSPN an compare it to the state-of-the-art structure learning algorithm ID-SPN [15]. More concretely, we keep the moel parameters learne from LearnSPN an use them to initialize CCCP. We then upate the moel parameters globally using CCCP as a fine tuning technique. This normally helps to obtain a better generative moel since the original parameters are learne greeily an locally uring the structure learning algorithm. We use the valiation set log-likelihoo score to avoi overfitting. The algorithm returns the set of parameters that achieve the best valiation set log-likelihoo score as output. Experimental results are reporte in Table. 2. As shown in Table 2, the use of CCCP after LearnSPN always helps to improve the moel performance. By optimizing moel parameters on these 20 ata sets, we boost LearnSPN to achieve better results than state-of-the-art ID-SPN on 7 ata sets, where the original LearnSPN only outperforms ID-SPN on 1 ata set. Note that the sizes of the SPNs returne by LearnSPN are much smaller than those prouce by ID-SPN. Hence, it is remarkable that by fine tuning the parameters with CCCP, we can achieve better performance espite the fact that the moels are smaller. For a fair comparison, we also list the size of the SPNs returne by ID-SPN in the supplementary material. 5 Conclusion We show that the network polynomial of an SPN is a posynomial function of the moel parameters, an that parameter learning yiels a signomial program. We propose two convex relaxations to solve the SP. We analyze the convergence properties of CCCP for learning SPNs. Extensive experiments are conucte to evaluate the propose approaches an current methos. We also recommen combining CCCP with structure learning algorithms to boost the moeling accuracy. Acknowlegments HZ an GG gratefully acknowlege support from ONR contract N HZ also thanks Ryan Tibshirani for the helpful iscussion about CCCP. 8

9 References [1] S. Boy, S.-J. Kim, L. Vanenberghe, an A. Hassibi. A tutorial on geometric programming. Optimization an Engineering, 8(1):67 127, [2] H. Chan an A. Darwiche. On the robustness of most probable explanations. In In Proceeings of the Twenty Secon Conference on Uncertainty in Artificial Intelligence. [3] M. Chiang. Geometric programming for communication systems. Now Publishers Inc, [4] A. Darwiche. A ifferential approach to inference in Bayesian networks. Journal of the ACM (JACM), 50(3): , [5] A. Dennis an D. Ventura. Greey structure search for sum-prouct networks. In International Joint Conference on Artificial Intelligence, volume 24, [6] R. Gens an P. Domingos. Discriminative learning of sum-prouct networks. In Avances in Neural Information Processing Systems, pages , [7] R. Gens an P. Domingos. Learning the structure of sum-prouct networks. In Proceeings of The 30th International Conference on Machine Learning, pages , [8] A. Gunawarana an W. Byrne. Convergence theorems for generalize alternating minimization proceures. The Journal of Machine Learning Research, 6: , [9] P. Hartman et al. On functions representable as a ifference of convex functions. Pacific J. Math, 9(3): , [10] J. Kivinen an M. K. Warmuth. Exponentiate graient versus graient escent for linear preictors. Information an Computation, 132(1):1 63, [11] G. R. Lanckriet an B. K. Sriperumbuur. On the convergence of the concave-convex proceure. pages , [12] R. Peharz. Founations of Sum-Prouct Networks for Probabilistic Moeling. PhD thesis, Graz University of Technology, [13] R. Peharz, S. Tschiatschek, F. Pernkopf, an P. Domingos. On theoretical properties of sumprouct networks. In AISTATS, [14] H. Poon an P. Domingos. Sum-prouct networks: A new eep architecture. In Proc. 12th Conf. on Uncertainty in Artificial Intelligence, pages , [15] A. Rooshenas an D. Low. Learning sum-prouct networks with irect an inirect variable interactions. In ICML, [16] R. Salakhutinov, S. Roweis, an Z. Ghahramani. On the convergence of boun optimization algorithms. UAI, [17] C. J. Wu. On the convergence properties of the EM algorithm. The Annals of Statistics, pages , [18] A. L. Yuille, A. Rangarajan, an A. Yuille. The concave-convex proceure (CCCP). Avances in Neural Information Processing Systems, 2: , [19] W. I. Zangwill. Nonlinear programming: a unifie approach, volume 196. Prentice-Hall Englewoo Cliffs, NJ, [20] H. Zhao, M. Melibari, an P. Poupart. On the Relationship between Sum-Prouct Networks an Bayesian Networks. In ICML, [21] H. Zhao, T. Ael, G. Goron, an B. Amos. Collapse variational inference for sum-prouct networks. In ICML,

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Online and Distributed Bayesian Moment Matching for Parameter Learning in Sum-Product Networks

Online and Distributed Bayesian Moment Matching for Parameter Learning in Sum-Product Networks Online and Distributed Bayesian Moment Matching for Parameter Learning in Sum-Product Networks Abdullah Rashwan Han Zhao Pascal Poupart Computer Science University of Waterloo arashwan@uwaterloo.ca Machine

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

On the Relationship between Sum-Product Networks and Bayesian Networks

On the Relationship between Sum-Product Networks and Bayesian Networks On the Relationship between Sum-Product Networks and Bayesian Networks International Conference on Machine Learning, 2015 Han Zhao Mazen Melibari Pascal Poupart University of Waterloo, Waterloo, ON, Canada

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

Introduction to Machine Learning

Introduction to Machine Learning How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression

More information

Cascaded redundancy reduction

Cascaded redundancy reduction Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

Linear Time Computation of Moments in Sum-Product Networks

Linear Time Computation of Moments in Sum-Product Networks Linear Time Computation of Moments in Sum-Product Netorks Han Zhao Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 han.zhao@cs.cmu.edu Geoff Gordon Machine Learning Department

More information

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016 Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network

More information

A. Exclusive KL View of the MLE

A. Exclusive KL View of the MLE A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Differentiation ( , 9.5)

Differentiation ( , 9.5) Chapter 2 Differentiation (8.1 8.3, 9.5) 2.1 Rate of Change (8.2.1 5) Recall that the equation of a straight line can be written as y = mx + c, where m is the slope or graient of the line, an c is the

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

ON THE OPTIMALITY SYSTEM FOR A 1 D EULER FLOW PROBLEM

ON THE OPTIMALITY SYSTEM FOR A 1 D EULER FLOW PROBLEM ON THE OPTIMALITY SYSTEM FOR A D EULER FLOW PROBLEM Eugene M. Cliff Matthias Heinkenschloss y Ajit R. Shenoy z Interisciplinary Center for Applie Mathematics Virginia Tech Blacksburg, Virginia 46 Abstract

More information

Fast Resampling Weighted v-statistics

Fast Resampling Weighted v-statistics Fast Resampling Weighte v-statistics Chunxiao Zhou Mar O. Hatfiel Clinical Research Center National Institutes of Health Bethesa, MD 20892 chunxiao.zhou@nih.gov Jiseong Par Dept of Math George Mason Univ

More information

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1 Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

A Novel Decoupled Iterative Method for Deep-Submicron MOSFET RF Circuit Simulation

A Novel Decoupled Iterative Method for Deep-Submicron MOSFET RF Circuit Simulation A Novel ecouple Iterative Metho for eep-submicron MOSFET RF Circuit Simulation CHUAN-SHENG WANG an YIMING LI epartment of Mathematics, National Tsing Hua University, National Nano evice Laboratories, an

More information

Calculus and optimization

Calculus and optimization Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function

More information

Gaussian processes with monotonicity information

Gaussian processes with monotonicity information Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Problems Governed by PDE. Shlomo Ta'asan. Carnegie Mellon University. and. Abstract

Problems Governed by PDE. Shlomo Ta'asan. Carnegie Mellon University. and. Abstract Pseuo-Time Methos for Constraine Optimization Problems Governe by PDE Shlomo Ta'asan Carnegie Mellon University an Institute for Computer Applications in Science an Engineering Abstract In this paper we

More information

KNN Particle Filters for Dynamic Hybrid Bayesian Networks

KNN Particle Filters for Dynamic Hybrid Bayesian Networks KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Collapsed Variational Inference for Sum-Product Networks

Collapsed Variational Inference for Sum-Product Networks Han hao HAN.HAO@CS.CMU.EDU Tameem Adel T.M.A.A.HESHAM@UVA.NL Geoff Gordon GGORDON@CS.CMU.EDU Brandon Amos BAMOS@CS.CMU.EDU School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA Machine

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion Hybri Fusion for Biometrics: Combining Score-level an Decision-level Fusion Qian Tao Raymon Velhuis Signals an Systems Group, University of Twente Postbus 217, 7500AE Enschee, the Netherlans {q.tao,r.n.j.velhuis}@ewi.utwente.nl

More information

WEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada

WEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada WEIGHTIG A RESAMPLED PARTICLE I SEQUETIAL MOTE CARLO L. Martino, V. Elvira, F. Louzaa Dep. of Signal Theory an Communic., Universia Carlos III e Mari, Leganés (Spain). Institute of Mathematical Sciences

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Lecture 6: Calculus. In Song Kim. September 7, 2011

Lecture 6: Calculus. In Song Kim. September 7, 2011 Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

A Modification of the Jarque-Bera Test. for Normality

A Modification of the Jarque-Bera Test. for Normality Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam

More information

Error Floors in LDPC Codes: Fast Simulation, Bounds and Hardware Emulation

Error Floors in LDPC Codes: Fast Simulation, Bounds and Hardware Emulation Error Floors in LDPC Coes: Fast Simulation, Bouns an Harware Emulation Pamela Lee, Lara Dolecek, Zhengya Zhang, Venkat Anantharam, Borivoje Nikolic, an Martin J. Wainwright EECS Department University of

More information

Estimating Causal Direction and Confounding Of Two Discrete Variables

Estimating Causal Direction and Confounding Of Two Discrete Variables Estimating Causal Direction an Confouning Of Two Discrete Variables This inspire further work on the so calle aitive noise moels. Hoyer et al. (2009) extene Shimizu s ientifiaarxiv:1611.01504v1 [stat.ml]

More information

arxiv: v2 [cs.ds] 11 May 2016

arxiv: v2 [cs.ds] 11 May 2016 Optimizing Star-Convex Functions Jasper C.H. Lee Paul Valiant arxiv:5.04466v2 [cs.ds] May 206 Department of Computer Science Brown University {jasperchlee,paul_valiant}@brown.eu May 3, 206 Abstract We

More information

Summary: Differentiation

Summary: Differentiation Techniques of Differentiation. Inverse Trigonometric functions The basic formulas (available in MF5 are: Summary: Differentiation ( sin ( cos The basic formula can be generalize as follows: Note: ( sin

More information

Level Construction of Decision Trees in a Partition-based Framework for Classification

Level Construction of Decision Trees in a Partition-based Framework for Classification Level Construction of Decision Trees in a Partition-base Framework for Classification Y.Y. Yao, Y. Zhao an J.T. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canaa S4S

More information

arxiv: v4 [math.pr] 27 Jul 2016

arxiv: v4 [math.pr] 27 Jul 2016 The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,

More information

Situation awareness of power system based on static voltage security region

Situation awareness of power system based on static voltage security region The 6th International Conference on Renewable Power Generation (RPG) 19 20 October 2017 Situation awareness of power system base on static voltage security region Fei Xiao, Zi-Qing Jiang, Qian Ai, Ran

More information

Introduction to Markov Processes

Introduction to Markov Processes Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

arxiv: v1 [math.co] 29 May 2009

arxiv: v1 [math.co] 29 May 2009 arxiv:0905.4913v1 [math.co] 29 May 2009 simple Havel-Hakimi type algorithm to realize graphical egree sequences of irecte graphs Péter L. Erős an István Miklós. Rényi Institute of Mathematics, Hungarian

More information

arxiv: v5 [cs.lg] 28 Mar 2017

arxiv: v5 [cs.lg] 28 Mar 2017 Equilibrium Propagation: Briging the Gap Between Energy-Base Moels an Backpropagation Benjamin Scellier an Yoshua Bengio * Université e Montréal, Montreal Institute for Learning Algorithms March 3, 217

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

05 The Continuum Limit and the Wave Equation

05 The Continuum Limit and the Wave Equation Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,

More information

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING Mark A. Kon Department of Mathematics an Statistics Boston University Boston, MA 02215 email: mkon@bu.eu Anrzej Przybyszewski

More information

Ramsey numbers of some bipartite graphs versus complete graphs

Ramsey numbers of some bipartite graphs versus complete graphs Ramsey numbers of some bipartite graphs versus complete graphs Tao Jiang, Michael Salerno Miami University, Oxfor, OH 45056, USA Abstract. The Ramsey number r(h, K n ) is the smallest positive integer

More information

Generalizing Kronecker Graphs in order to Model Searchable Networks

Generalizing Kronecker Graphs in order to Model Searchable Networks Generalizing Kronecker Graphs in orer to Moel Searchable Networks Elizabeth Boine, Babak Hassibi, Aam Wierman California Institute of Technology Pasaena, CA 925 Email: {eaboine, hassibi, aamw}@caltecheu

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Maximum A Posteriori Inference in Sum-Product Networks

Maximum A Posteriori Inference in Sum-Product Networks Maximum A Posteriori Inference in Sum-Product Networks Jun Mei, Yong Jiang, Kewei Tu ShanghaiTech University {meijun,jiangyong,tukw}@shanghaitech.edu.cn Abstract Sum-product networks (SPNs) are a class

More information

P(x) = 1 + x n. (20.11) n n φ n(x) = exp(x) = lim φ (x) (20.8) Our first task for the chain rule is to find the derivative of the exponential

P(x) = 1 + x n. (20.11) n n φ n(x) = exp(x) = lim φ (x) (20.8) Our first task for the chain rule is to find the derivative of the exponential 20. Derivatives of compositions: the chain rule At the en of the last lecture we iscovere a nee for the erivative of a composition. In this lecture we show how to calculate it. Accoringly, let P have omain

More information

arxiv: v4 [cs.ds] 7 Mar 2014

arxiv: v4 [cs.ds] 7 Mar 2014 Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Make graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides

Make graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides Reference 1: Transformations of Graphs an En Behavior of Polynomial Graphs Transformations of graphs aitive constant constant on the outsie g(x) = + c Make graph of g by aing c to the y-values on the graph

More information

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim QF101: Quantitative Finance September 5, 2017 Week 3: Derivatives Facilitator: Christopher Ting AY 2017/2018 I recoil with ismay an horror at this lamentable plague of functions which o not have erivatives.

More information

Final Exam Study Guide and Practice Problems Solutions

Final Exam Study Guide and Practice Problems Solutions Final Exam Stuy Guie an Practice Problems Solutions Note: These problems are just some of the types of problems that might appear on the exam. However, to fully prepare for the exam, in aition to making

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

On the Relationship between Sum-Product Networks and Bayesian Networks

On the Relationship between Sum-Product Networks and Bayesian Networks On the Relationship between Sum-Product Networks and Bayesian Networks by Han Zhao A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of

More information

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7. Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Capacity Analysis of MIMO Systems with Unknown Channel State Information Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling Case Stuy 5: Mixe Membership Moeling LDA Collapse Gibbs Sampler, VariaNonal Inference Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 Task : Mixe

More information

Switching Time Optimization in Discretized Hybrid Dynamical Systems

Switching Time Optimization in Discretized Hybrid Dynamical Systems Switching Time Optimization in Discretize Hybri Dynamical Systems Kathrin Flaßkamp, To Murphey, an Sina Ober-Blöbaum Abstract Switching time optimization (STO) arises in systems that have a finite set

More information

INDEPENDENT COMPONENT ANALYSIS VIA

INDEPENDENT COMPONENT ANALYSIS VIA INDEPENDENT COMPONENT ANALYSIS VIA NONPARAMETRIC MAXIMUM LIKELIHOOD ESTIMATION Truth Rotate S 2 2 1 0 1 2 3 4 X 2 2 1 0 1 2 3 4 4 2 0 2 4 6 4 2 0 2 4 6 S 1 X 1 Reconstructe S^ 2 2 1 0 1 2 3 4 Marginal

More information

All s Well That Ends Well: Supplementary Proofs

All s Well That Ends Well: Supplementary Proofs All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee

More information

Optimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations

Optimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations Optimize Schwarz Methos with the Yin-Yang Gri for Shallow Water Equations Abessama Qaouri Recherche en prévision numérique, Atmospheric Science an Technology Directorate, Environment Canaa, Dorval, Québec,

More information

AN INTRODUCTION TO NUMERICAL METHODS USING MATHCAD. Mathcad Release 14. Khyruddin Akbar Ansari, Ph.D., P.E.

AN INTRODUCTION TO NUMERICAL METHODS USING MATHCAD. Mathcad Release 14. Khyruddin Akbar Ansari, Ph.D., P.E. AN INTRODUCTION TO NUMERICAL METHODS USING MATHCAD Mathca Release 14 Khyruin Akbar Ansari, Ph.D., P.E. Professor of Mechanical Engineering School of Engineering an Applie Science Gonzaga University SDC

More information

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods Hyperbolic Moment Equations Using Quarature-Base Projection Methos J. Koellermeier an M. Torrilhon Department of Mathematics, RWTH Aachen University, Aachen, Germany Abstract. Kinetic equations like the

More information

Quantum Mechanics in Three Dimensions

Quantum Mechanics in Three Dimensions Physics 342 Lecture 20 Quantum Mechanics in Three Dimensions Lecture 20 Physics 342 Quantum Mechanics I Monay, March 24th, 2008 We begin our spherical solutions with the simplest possible case zero potential.

More information

Polynomial Inclusion Functions

Polynomial Inclusion Functions Polynomial Inclusion Functions E. e Weert, E. van Kampen, Q. P. Chu, an J. A. Muler Delft University of Technology, Faculty of Aerospace Engineering, Control an Simulation Division E.eWeert@TUDelft.nl

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January

More information

Connections Between Duality in Control Theory and

Connections Between Duality in Control Theory and Connections Between Duality in Control heory an Convex Optimization V. Balakrishnan 1 an L. Vanenberghe 2 Abstract Several important problems in control theory can be reformulate as convex optimization

More information

Two Dimensional Numerical Simulator for Modeling NDC Region in SNDC Devices

Two Dimensional Numerical Simulator for Modeling NDC Region in SNDC Devices Journal of Physics: Conference Series PAPER OPEN ACCESS Two Dimensional Numerical Simulator for Moeling NDC Region in SNDC Devices To cite this article: Dheeraj Kumar Sinha et al 2016 J. Phys.: Conf. Ser.

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,

More information

Robustness and Perturbations of Minimal Bases

Robustness and Perturbations of Minimal Bases Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas Moelling an simulation of epenence structures in nonlife insurance with Bernstein copulas Prof. Dr. Dietmar Pfeifer Dept. of Mathematics, University of Olenburg an AON Benfiel, Hamburg Dr. Doreen Straßburger

More information

ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS

ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS MARK SCHACHNER Abstract. When consiere as an algebraic space, the set of arithmetic functions equippe with the operations of pointwise aition an

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information