Adapting Bayes Nets to Non-stationary Probability Distributions

Size: px
Start display at page:

Download "Adapting Bayes Nets to Non-stationary Probability Distributions"

Transcription

1 Adapting Bayes Nets to Non-stationary robability Distributions Søren Holbech Nielsen & Thomas D. Nielsen June 4, Introduction Ever since earl (1988) published his seminal book on Bayesian networks (BNs), the formalism has become a widespread tool for representing, eliciting, and discovering probabilistic relationships for problem domains defined over discrete variables. One area of research that has seen much activity is the area of learning the structure of BNs, where probabilistic relationships for variables are discovered, or inferred, from a database of observations of these variables (main papers in learning include (Cooper and Herskovits 1992), (Heckerman, Geiger, and Chickering 1994), and (Chickering 2002)). One part of this research area focuses on incremental learning of BNs, where observations are received sequentially, and a BN structure is gradually constructed along the way without keeping all the observations in memory. A special case of incremental learning is structural adaptation, where the incremental algorithm maintains one or more candidate structures and applies changes to these as observations are received. This particular area of research has received very little attention, with the only results that we are aware of being (Buntine 1991), (Lam and Bacchus 1994) (and later (Lam 1998)), (Friedman and Goldszmidt 1997), and (Alcobé 2004). These papers all assume that the database of observations has been produced by a stationary stochastic process. That is, the ordering of the observations in the database is inconsequential. However, many real life observable processes cannot really be said to be invariant with respect to time: Mechanical mechanisms tend to become worn out as time progresses, for instance, and influences not observed in the data may change suddenly. When human decision makers are somehow involved in the data generating process, these are almost surely not fully describable by the observables 1

2 and may change their behaviour instantaneously if some advantage can be gained. A very simple example of a situation, where it is unrealistic to expect a stationary generating process, is a database storing business transactions and internal communications of some company. If at some point a high ranking employee is replaced, the subsequent data would be distributed differently from that representing the time before the change. In this work we relax the assumption on stationary data, opting instead for learning from data which is only approximately stationary. More concretely, we assume that the data generating process is piecewise stationary, as in the example given above. Furthermore, we focus on domains where the shifts in distribution from one stationary period to the next is of a local nature (i.e. only a subset of the probabilistic relationships among variables change as the shifts take place). As the assumptions underlying the method are generalizations of those carrying other structural adaptation methods, we may even hope that our method is competitive for incremental learning from data generated by stationary distributions. 2 reliminaries Here we present the definitions and terminology necessary to understand the remainder of the text. As a general notational rule we use bold font to denote sets and vectors (V, c, etc.) and calligraphic font to denote mathematical structures and compositions (B, G, etc.). Moreover, we shall use upper case letters to denote random variables or sets of random variables (X, Y, V, etc.), and lower case letters to denote specific states of these variables (x 4, y, c, etc.). is used to denote defined as. A BN B (G,Φ) over a set of discrete variables V consists of an acyclic directed graph (traditionally abbreviated DAG) G, whose nodes are the variables in V, and a set of conditional probability distributions Φ (which we abbreviate CTs for conditional probability table ). A correspondence between G and Φ is enforced by requiring that Φ consists of one CT (X A G (X)) for each variable X, specifying a conditional probability distribution for X given each possible instantiation of the parents A G (X) of X in G. A unique joint distribution B over V is obtained by taking the product of all the CTs in Φ. In a graph G we shall often need to distinguish between different types of connections on paths between nodes: For three nodes X, Y, and Z next to each other on some path a, we say that the connection at Y is serial, if arcs X Y and Y Z are part of a; if arcs X Y and Y Z are part of a, 2

3 we call the connection diverging; and if arcs X Y and Y Z are part of a, we call it converging. If for such a converging connection, we furthermore have that X and Z are not adjacent in G, then we call the triple (X,Y,Z) a v-structure of G. Due to the construction of B we are guaranteed that all dependencies inherent in B can be read directly from G by use of the d-separation criterion (earl 1988): For any path a between two nodes X and Y in G, we say that a is active given a set of nodes Z, if for each node W on the path, either the connection at W is serial or diverging, and W is not in Z, or the connection is converging, and either W or one of its descendants are in Z. If there is no active path between two nodes X and Y given Z, then we say that X and Y are d-separated given Z. The d-separation criterion states that, if X and Y are d-separated by Z, then it holds that X is conditionally independent of Y given Z in B, or equivalently, if X is conditionally dependent of Y given Z in B, then X and Y are not d-separated by Z in G. In the remainder of the text, we use X G Y Z to denote that X is d-separated from Y by Z in the DAG G, and X Y Z to denote that X is conditionally independent of Y given Z in the distribution. The d-separation criterion is thus X G Y Z X B Y Z, (1) for any BN B (G,Φ). The set of all conditional independence statements that may be read from a graph in this manner, we refer to as that graph s d-separation properties. We refer to any two graphs over the same variables as being equivalent if they have the same d-separation properties. Equivalence is obviously an equivalence relation. Verma and earl (1991) proved that equivalent graphs necessarily must have the same skeleton and the same v-structures, and the equivalence class of graphs equivalent to a specific graph G can then be uniquely represented by the partially directed graph G obtained from the skeleton of G by directing links that participate in a v-structure in G in the direction dictated by G. G is called the pattern of G. Any graph G, which is obtained from G by directing the remaining undirected links, without creating a directed cycle or a new v-structure, is then equivalent to G. We say that G is a consistent extension of G. The partially directed graph G obtained from G, by directing undirected links as they appear in G, whenever all consistent extensions of G agree on this direction, is called the completed pattern of G. G is obviously a unique representation of G s equivalence class as well. Any arc in G is called compelled in G. 3

4 Given any joint distribution over V it is possible to construct a BN B such that = B (earl 1988). A distribution for which there is a BN B (G,Φ ) such that B = and also X Y Z X G Y Z (2) holds, is called DAG faithful, and B (and sometimes G alone) is called a perfect map. DAG faithful distributions are important since, if a data generating process is known to be DAG faithful, then a perfect map can, in principle, be inferred from the data under the assumption that the data is representative of the distribution. (earl 1988) proves that the following important properties holds of any DAG faithful probability distributions : A B D C A B C A D C (3) A B D C A B C D. (4) For any probability distribution over variables V and variable X V, we define a Markov boundary of X to be a set S V \ {X} such that X V \ (S {X}) S and this holds for no proper subset of S. It is easy to see that if is DAG faithful, the Markov boundary of X is uniquely defined, and consists of X s parents, children, and children s parents in a perfect map of. In the case of being DAG faithful, we denote the Markov boundary by MB (X). For any DAG G, we denote the set of X s parents, children, and children s parents in G as MB G (X). 3 An Adaptive Approach We now present our method for structural adaptation to a piecewise stationary data sequence. We first describe the problem formally, and then the proposed method is presented. 3.1 The Adaptation roblem We say that a sequence is samples from a piecewise DAG faithful distribution, if the sequence can be partitioned into sets such that each set is a database sampled from a single DAG faithful distribution. The rank of the sequence is the size of the smallest such partition. Formally, let d = (d 1,...,d l ) be a sequence of observations over variables V. We say that the sequence is sampled from a piecewise DAG faithful distribution (or simply that it is a piecewise DAG faithful sequence), if there are indices 1 = i 1 < < i m = l, such that each of d (j) = (d ij,...,d ij+1 ), for 1 j m 1, is a sequence 4

5 of samples from a DAG faithful distribution. The rank of the sequence is defined to be min j i j+1 i j, and we say that m 1 is its size and l its length. Obviously, we can have any sequence of observations being indistinguishable from a piecewise DAG faithful sequence, by selecting the partitions small enough, so we restrict our attention to sequences that are piecewise DAG faithful of at least rank r. However, we do not assume that neither the actual rank nor size of the sequence are known, and specifically we do not assume that the indices i 1,...,i m are known. We will assume that each sample is complete, though, so that no observations in the sequence have missing values. The learning task that we address in this paper consists of incrementally learning a BN, while receiving a piecewise DAG faithful sequence of samples, and making sure that after reception of each sample point the BN structure is as close as possible to the distribution that generated this point. Formally, let d be a complete piecewise DAG faithful sample sequence of length l, and let t be the distribution generating sample point t. Furthermore, let B 1,..., B l be BNs output (or maintained) by a structural adaptation method M, when receiving d and starting with BN B. Given a distance measure on BNs d, we define the precision of M on d wrt. d as prec(m, B,d) 1 l l d(b i, B i ). (5) i=1 For a method M to adapt to a DAG faithful sample sequence d wrt. d then means that M has a higher precision than the trivial learner (which never learns) on d wrt. d. 3.2 Our Approach The method that we propose here builds on the idea of automating the human practice of revising models only when needed, and then only the parts of the models that are in need of a revision. The method consists of two main mechanisms: One, monitoring the current BN while receiving evidence, detecting when the model should be changed, and two, adapting the parts of the model that conflicts with the evidence. These two mechanisms roughly corresponds with Algorithms 1 and 2. More detailed, our method maintains histories of conflict measures for the nodes in the current BN along with recent observations, and then initiate a local search over a set of nodes if the conflict measure histories of these nodes indicate that the underlying distribution has changed. A local 5

6 search means that we assume that only some aspects of the perfect maps of the two underlying generating distributions differ. The general form of the method is given in Algorithm 1. We have designed it to use a number of helper functions, whose exact implementation is open for experimentation as long as they fulfill a number of requirements. The responsibilities of ConflictMeasure( ) and ChangeInStream( ) will be described shortly. The actual technical implementation choices we have made for these methods are presented in relation to our experimental results in Section 5. The specification of UpdateNet( ) is given in Algorithm 2, and is elaborated upon later. The role of the rest of the methods should be self-explanatory, and their implementation is trivial. Algorithm 1 Algorithm for BN adaption. Takes as input an initial network B, defined over variables V, a series of cases h, and an allowed delay k for determining shifts in h. 1: procedure Adapt(B, V, h, k) 2: h [] 3: c X [] X V 4: loop 5: x NextCase(h ) 6: Append(h, (x)) 7: S 8: for X V do 9: c ConflictMeasure(B, X, x) 10: Append(c X, c) 11: if ChangeInStream(c X, k) then 12: S S {X} 13: end if 14: end for 15: h LastKEntries(h, k) 16: if S then 17: UpdateNet(B, S, h) 18: end if 19: end loop 20: end procedure The responsibilities of the two helper methods are as follows: ConflictMeasure(B, X i, x) returns a real value reflecting how inconsistent the observation x is with the way X i connects to the rest of the model B. ChangeInStream(c X, k) is a boolean valued function that, given a sequence of conflict measure values for X, tells if there appears to be a jump between the concrete values of the last k values and the values before that; 6

7 such a jump would be an indication of an abrupt change in the underlying generating distribution. Algorithm 2 Update Algorithm for BN. Takes as input the network to be updated B, a set of variables whose structural bindings may be wrong S, and data to learn from h 1: procedure UpdateNet(B, S, h) 2: for X S do 3: M X UncoverMarkovBoundary(X, h) 4: G X UncoverAdjacencies(X, M X, h) 5: artiallydirect(x, G X, M X, h) 6: end for 7: G MergeFragments({G X } X S ) 8: G GraphOf(B) 9: (G, C) MergeAndDirect(G, G, S) 10: Φ 11: for X V do 12: if X C then 13: Φ Φ { h (X A G (X))} 14: else 15: Φ Φ { B (X A G (X))} 16: end if 17: end for 18: B (G,Φ ) 19: end procedure UpdateNet(B, S, h) in Algorithm 2 adapts the BN B around the nodes in S to fit the empirical distribution defined by the data in h, which will be the last k received cases at the time of invocation. Throughout the text, this empirical distribution will also be denoted h which meaning h takes on should be clear from the context. The update of B to fit h is based on the assumption that only nodes in S need updating of their probabilistic bindings (i.e. their families in B h ). The method is listed in Algorithm 2. As Adapt( ) in Algorithm 1, it employs a number of helper methods that we only specify semantically. The concrete implementations of these methods that we have used in our experiments are presented in Section 5. UpdateNet( ) first runs through the nodes in S and uncovers for each node X the subgraph 1 of Bh consisting of X and links and arcs connecting X to its adjacents. The construction of this subgraph is divided into three steps: UncoverMarkovBoundary(X, h), which should compute MB h (X), or 1 Subgraph is used in a loose sense here, as the graph fragments produced do not contain links and arcs between X s adjacent variables. 7

8 at least a superset thereof; UncoverAdjacencies(X, M X, h), which first computes the nodes in M X that, according to h, are inseparable from X given subsets of M X, and then constructs a simple graph structure, where X is connected to each of these nodes by a link; and, finally, artiallydirect(x, G X, M X, h), which directs some of the links in G X. This method is presented in Algorithm 3 and elaborated upon later in this section. After constructing network fragments for all nodes in S, Algorithm 2 merges these into a single (not necessarily connected) graph G in Merge- Fragments( ). This happens as a simple graph union. 2 The result may still contain undirected arcs. G is merged with the a partially directed graph based on the subgraph of B s graph G that corresponds to nodes not in S into a new graph G by MergeAndDirect( ). MergeAndDirect( ) is described in the end of this section. MergeAndDirect( ) returns both the joined graph G and a set C which contains all nodes who have got a new parent set (ideally a subset of S). Algorithm 2 ends by constructing new CTs for the newly constructed BN structure G. Nodes not in C are assigned the same CT as in B, and nodes in C have new CTs calculated from h. Algorithm 3 Uncovers the direction of some arcs adjacent to a variable X in Bh. G X (X) is nodes connected to X by a link in G X. 1: procedure artiallydirect(x, G X, M X, h) 2: DirectAsInOtherFragments(G X ) 3: for Y M X \ (NE GX (X) A GX (X)) do 4: for T M X \ {X, Y } do 5: if I h (X, Y T) then 6: for Z NE GX (X) \ T do 7: if I h (X, Y T {Z}) then 8: LinkToArc(G X, X, Z) 9: end if 10: end for 11: end if 12: end for 13: end for 14: end procedure The method artiallydirect(x, G X, M X, h) directs a number of links in the graph fragment G X in accordance with the direction of these in B h. With DirectAsInOtherFragments(G X), the procedure first ex- 2 Due to the inner workings of artiallydirect( ) no conflicts of orientation among nets can occur. 8

9 change links for arcs when previously directed graph fragments dictate this. The procedure then runs through each variable Y in M X not adjacent to X, finds a set of nodes in M X that separate X from Y, and then tries to re-establish connection to Y by repeatedly expanding the set of separating nodes by a single node adjacent to X. If such a node can be found it has to be a child of X (see Corollary 16). No arc in Bh originating from X, is left as a link by the procedure (see Lemma 13). During these steps the procedure uses an independence oracle I h derived from h. The independence oracle should return true for a call I h (X,Y T) iff X h Y T. Algorithm 4 Merges graphs G and G under the assumption that connection to nodes in S have just been relearned. 1: procedure MergeAndDirect(G, G, S) 2: E EdgesOf(G ) 3: for X / S do 4: for Y AD G (X) \ S do 5: E E {(X, Y ), (Y, X)} 6: end for 7: end for 8: for X / S do 9: for Y AD G (X) \ S do 10: if Z MB G (X) \ AD G (X) : Y is descendant of Z then 11: E E \ {(Y, X)} 12: end if 13: end for 14: end for 15: return Direct((V, E)) 16: end procedure MergeAndDirect(G, G, S) in Algorithm 4 merges the graphs G and G by adding links between nodes in V \S in G, if these are adjacent in G. Then it directs some of these links: A link X Y is directed as X Y if X Y is an arc in G, and Y is a descendent of some node Z in MB G (X)\AD G (X) through a path involving only children of X. That this particular rule is sensible is proved in Section 4. Links and arcs between nodes in S and those not in S should theoretically be the same, but due to flawed statistical tests, in practice they are set either as defined in G or as defined in G, depending on how aggressive the adaptation should be. Initial experiments seemed to indicate that the latter option yields a small performance increase, and this is the setting that we have used in the end experiments, and the setting that Algorithm 4 has been written to conform to. 9

10 After the merge, any remaining links in the joined graph are directed by Direct( ) by application of the following rules 1. if X Y is a link, Z X is an arc, and Z and Y are non-adjacent, then direct the link X Y as X Y, 2. if X Y is a link and there is a directed path from X to Y, then direct the link X Y as X Y, and 3. if Rules 1 and 2 cannot be applied, chose a link at random, and direct it randomly. Due to potentially flawed statistical tests, the resultant graph may contain cycles involving at least some nodes in S. These are eliminated by reversing arcs between nodes in S only. The reversal process resembles the one used in (Margaritis and Thrun 2000): We remove all arcs between nodes in S that appears in at least one cycle. We order the removed arcs according to how many cycles they appear in, and then insert them back in the graph, starting with the arcs that appear in the least number of cycles. When at some point the insertion of an arc gives rise to a cycle, we insert the arc as its reverse. 4 Formal Correctness We have obtained a proof ensuring that under a set of assumptions our adaptation method is correct, in the sense that when the underlying distribution generating the sequence of data changes, the algorithm reacts and changes the current BN to accurately reflect the perfect map of the new underlying distribution. Moreover, the result makes it clear what it means for these changes to be local, probabilistically speaking. First, we need a few definitions, on which the assumptions are based: Definition 1. Let be a DAG faithful probability distribution over variables V, X in V, and Y in MB (X). We say that a set T is a maximal separating set of Y from X wrt. if T MB (X) \ {Y }, Y X T, and this holds for no T, where T T MB (X) \ {Y }. 10

11 The set of all variables in V, for which a maximal separating set from X wrt. exists, we denote by S X (the intuition being separable from ). For each Y in S X Y X, we denote by T the intersection of all maximal separating sets of Y from X wrt., and by E Y X the set MB (X) \ (T Y X {Y }) ( dependency enabling ). We expect, but have been unable to prove, that there is only one maximal separating set of Y from X wrt. for any DAG faithful probability distribution. Definition 2. We say that a DAG faithful probability distribution 1 over variables V is similar to a DAG faithful probability distribution 2 over variables V on the variables S V, if we have that 1. MB 1 (X) = MB 2 (X), for all X S, 2. S X 1 = S X 2, for all X S, and 3. E Y X 1 \ S X 1 = E Y X 2 \ S X 2, for all X S and Y S X 1. Here Bullet 3 states that inseparable variables enabling dependencies between X and Y must be the same in both 1 and 2. A few points that should be noted, are that similarity on a set of variables S is an equivalence relation, and that two distributions similar on a set of variables S are also similar on any subset of S. Our main formal result is: Theorem 3. Let BN B 1 (G 1,Φ 1 ) and DAG faithful probability distributions 1 and 2 each be defined over variables V, and h be a DAG faithful sample sequence of size 2 and rank r, where 1 is the distribution of the first sample and 2 is the distribution of the last sample. Furthermore, let 1 be similar to 2 on S, and B 2 (G 2,Φ 2 ) be the result of running Algorithm 1 on B 1 with cases h and some choice of k. If 1. G 1 is equivalent to G 1, 2. r > k, 3 3. ChangeInStream(, k) is true for a variable X iff X / S and the algorithm is currently processing the k th sample drawn from 2, 3 Note that Bullet 6 incorporates the traditional assumption on independencies being identifiable from sufficient data, and that Bullet 2 is only concerned with ensuring that ChangeInStream( ) always has at least k cases from a new partition of h to detect the change. 11

12 4. UncoverMarkovBoundary(X, h) returns MB h (X), 5. G X returned from UncoverAdjacencies(X, M X, h) connects X to Y iff Y is in MB h (X) \ S X h, and 6. the oracle used in artiallydirect( ) is correct, then G 2 is equivalent to G 2. The theorem is important as it guarantees that if our method is started with a BN that is equivalent to the perfect map of some DAG faithful distribution, then no matter how often changes, as long as it remains DAG faithful, and at least 2k observations are made from between each change, then our method continues to adapt B to fit the distribution, with a delay of k observations. In what follows, we gradually build up a theory to support Theorem 3, and then give the proof for it. The main difficulty is that our method directs arcs between a node and its adjacents, without knowing how these adjacents interconnect. revious works on constraint based DAG discovery have been based on two phase algorithms that first uncover the skeleton of the DAG and then direct the links into arcs. That has not been possible in our case, where local parts of the graph have to be learned in isolation from the remaining graph. First, we need some small results that we have been unable to find proved elsewhere: Lemma 4. Let be a DAG faithful probability distribution over V, and let X Y S for X,Y V and all S MB (X) \ {Y }. Then Y X T for all T V \ {X,Y }. roof. First note that if X Y S for all S MB (X) \ {Y }, then it follows that Y must be in MB (X). As is DAG faithful, there is some DAG G which is a perfect map of. If there is an arc between X and Y in G, then there is no set T V \ {X,Y }, such that X G Y T and since G is a perfect map of that X Y T. Therefore, we just need to prove that X and Y must be adjacent if X Y S for all S MB (X) \ {Y } = MB G (X) \ {Y }: As Y is a member of MB G (X) it must either be adjacent to X or share a child with X. In the first case, the result follows, so we assume that X and Y share the children Z. It follows that Y cannot be a descendent of any variable in Z, and hence by the definition of a BN that X G Y S MB G (X) \ (Z {Y }). Thus, there is a set S MB G (X) \ {Y }, where 12

13 X G Y S, and since G is a perfect map of also X Y S. This violates the assumptions, so we conclude that X and Y must be adjacent. Corollary 5. Let be a DAG faithful probability distribution over V and X,Y V. If there is a set T V \ {X,Y } such that Y X T, then there is a set S MB (X) \ {Y }, such that Y X S. Since perfect maps of a distribution are equivalent and equivalent DAGs have the same skeleton, it follows from Corollary 5 that Corollary 6. Let be a DAG faithful probability distribution over V and X,Y V. Then Y / S X iff Y is adjacent to X in all perfect maps of, and Y S X iff Y is not adjacent to X in any perfect map of. Given the above result it is thus sufficient for a learning algorithm to be able to identify S X when learning the skeleton of G. Clearly, this can be seen as a localized operation. Now, we deduce a set of lemmas that are needed for local reasoning about direction of arcs. Lemma 7. Let be a DAG faithful probability distribution over variables V, X,Y,Z V, T V \ {X,Y,Z}, and G be a perfect map of. If X Y T and X Y T {Z}, then in G there is a node C, with non-adjacent parents A and B, such that Z is C or a descendent of C on a path involving no nodes in T, A is connected to X with a path that is active given T, and B is connected to Y with a path that is active given T. Trivially, we also have X G Z T and Y G Z T. roof. Since G is a perfect map of, it is clear that X G Y T and X G Y T {Z}. From the definition of d-separation, it follows that there is a node C, such that either Z is C or Z is a descendent of C along a path c involving no nodes in T, and C has two parents A and B, where A (B) is either X (Y ) or connected to X (Y ) on a path a (b) that is active given T {Z}. We need to show that for at least one such C, 1. both a and b are active given T alone, and 2. that A and B cannot be adjacent. We first show that there is a C fulfilling 1, and then that at least one of these Cs fulfills 2. Take C such that a is active given T {Z} but not T. This means that there is a converging connection at some node V on a, such that Z is V or 13

14 a descendent of V. WLOG let V be the V closest to X on a, and let a be the path from X to V, and v be the path from V to Z. Moreover, let U be the node furthest away from Z on v such that U is also on c, and let u be the part of c (or v) from U to Z, v be the part of v from V to U, and c the part of c from C to U. Then U satisfies the definition of C since the three paths a v, u, and c b connect X to U, U to Z, and U to B, and the connection at U between a v and c b is converging. However, unlike a, a v is active given T only. If b is also only active given T {Z}, we can perform the same transformation to that path and thus we get that 1 is satisfiable. Next, we show that given Cs that satisfy 1, we can find one that also satisfies 2. articularly, let C satisfy 1, and furthermore let C be chosen such that c has maximal length among all valid choices of C. Then A and B must be non-adjacent. Suppose this is not the case, and WLOG let the arc between A and B be oriented A B. This means that B cannot be Y as that would create a path from X to Y active given T only, contradicting that X G Y T. So consider the arc between B and the next node V on b. This arc cannot be directed into B as that would mean that B is a valid choice for C, yet with the length of the path from B to Z through C being longer than c, which contradicts the choice of C. But if the arc between B and V is directed away from B then we have that aa Bb is an acive path given T, which again contradicts X G Y T. Thus A and B are non-adjacent. Lemma 8. Let be a DAG faithful probability distribution over V, X,Z V, Y S X, T V \ {X,Y,Z}, and Z MB (X) \ (T S X {Y }). If X Y T and X Y T {Z}, then for any DAG G, which is a perfect map of, Z is in CH G (X). roof. As Z is not in S X, Corollary 6 yields that Z and X must be adjacent in every DAG that is a perfect map of. All that is left to be shown is that Z cannot be a parent of X. Assume this is false, and let G be a perfect map of in which Z is a parent of X. As X Y T and G is a perfect map of, it follows that X G Y T. (6) Specifically, (6) in conjunction with the assumption that X is a child of Z implies that there can be no path between Z and Y, which is active given T, i.e. Z G Y T. (7) 14

15 But from X Y T and X Y T {Z}, Lemma 7 allows us to conclude that Z G Y T, which contradicts 7. Lemma 9. Let be a DAG faithful probability distribution over V, X V, Y S X, T MB (X) \ {Y }, and Z MB (X) \ (T S X {Y }). If X Y T and X Y T {Z}, then there is W S X such that Z E W X. roof. Let G be a perfect map of. We show that there is a W S X such that X G W T (8) and X G W T T {Z}, (9) for some set T MB G (X)\{W,Z} and any T MB G (X)\(T {W,Z}). The result then follows by G being a perfect map of. From the assumptions, it is clear that there is at least one W S X (namely Y ), such that (8) and X G W T {Z}, (10) if we choose T as T. Since G is a perfect map of, we have that (8) implies X W T, (11) and (10) implies X W T {Z}. From Lemma 8 it follows that Z is a child of X in G. Moreover, from Lemma 7 we get that there must be at least one active path between W and Z. WLOG let a be the shortest such path connecting Z to such a W. We prove that the existence of a implies that (9) is fulfilled. First note that the last arc in a (connecting to Z) must be directed into Z, as otherwise there would be an active path from X to W given T, contradicting (8). Also note that if the length of a is 1, then we have the situation where Z is a child of both W and Z, and hence that a trivially ensures that (9) is fulfilled. Assume therefore that a has length 2 or more. We prove that the existence of a implies (9) by contradiction. That is we assume that there is a set T MB G (X) \ (T {W,Z}) such that X G W T T {Z}, and we prove that this means that a cannot be the shortest path matching its definition. As a is active given T and Z is a child of X, it follows that the nodes in T must render a inactive, which can only happen if some of the nodes in T lie on a. Let U be the node in T closest to W on a such that the connection at U is either serial or diverging (meaning that U renders a inactive). There are three possible scenarios: 15

16 1. a : (W U Z) 2. a : (W U Z) 3. a : (W U Z) Let b be the part of a running between W and U and c be the part between U and Z. We show that U can serve the same role as W, and hence that a cannot be the shortest active path connecting Z to a suitable W. Assume first that either Scenario 1 or 2 are the case: If X G U T, then there is an active path d between X and U given T, and hence also an active path from W to X given T formed by joining b and d. But that means that X G W T, which contradicts (8). Thus we must have that X G U T. Furthermore, since U is on a, it follows that X G U T {Z}. But then U satisfies (8) and (10) by the virtue of the active path c, and hence a is not the shortest such path. Thus none of Scenarios 1 and 2 can be the case. Assume then that Scenario 3 is the case: First we note that it cannot be the case that X G U T since b is active given T and this implies that (8) and (10) holds for U, thus violating the definition of a as the shortest path. Hence, X G U T meaning that there is an active path d from X to U given T. Since b is also active given T and (8) is assumed, it follows that the arc attached to U in d, must be an arc going into U. But since that results in a converging connection at U joining b and d, and X G W T, we have that no descendent of U can be in T. Specifically, we have that c cannot contain any nodes of T, and hence that it must be a directed path from U to Z. First assume that c contains some nodes that are adjacent to X. None of these can be parents of X, as the arc from the parent node to X along with the active path a, would mean that X G W T, contradicting (8). Thus nodes on c in MB G (X) are either children of X or non-adjacent to X. U itself must be non-adjacent to X, as U cannot be a child of X: That would imply that X G U R, for any set R, and since b is active given T T {Z} and there is a converging connection at U on the path formed by b and the arc from X to U, that X G W T T {Z}, violating assumption (10). Since U is not adjacent to X, we thus have that there is some node on c non-adjacent to X and closer to Z on c than any other node. Let V be this node. Since Z is a child of X and a direct descendant of V, it is easy to see that V satisfies both (8) and (10) and thus that a cannot be the shortest path connecting Z to a satisfying node, and the lemma is thus proved. 16

17 These basic but important lemmas aside, we now establish a connection between the purely probabilistic notion of similarity, and the DAGs that ensure it. Since we have that two equivalent DAGs G 1 and G 2 encode the same conditional independency statements, and these are reflected in any probability distributions whose perfect maps are one of G 1 and G 2, the following corollary is immediately obtained. Corollary 10. Let G 1 and G 2 be perfect maps of probability distributions 1 and 2 both defined over variables V. If G 1 and G 2 are equivalent, then 1 and 2 are similar on V. This connection spurs the introduction of a new concept: Definition 11 (Locally Compelled Arc). Let G 1 be a perfect map of a probability distribution 1 over V, and X and Y in V, such that X is a parent of Y in G 1. We call the arc from X to Y locally compelled in G 1, if for any probability distribution 2 similar to 1 on {X}, and any perfect map G 2 of 2, we have that X is a parent of Y in G 2. We denote by CH G (X) the variables that are attached to X in G by locally compelled arcs originating from X. It turns out that locally compelled arcs are important for learning network structures locally: Locally compelled arcs of a graph encompass all arcs of the pattern of that graph, and enjoy the property of being compelled too, as the following results show. Lemma 12. Let be a DAG faithful probability distribution over V, X V, Y S X, and T a maximal seperating set of Y from X wrt.. Then for any DAG G, which is a perfect map of, each member Z of MB (X) \ (T {Y } S X) must be a compelled child of X. That is, Z is in CH G (X). roof. We need to show that Z is a child of X in any perfect map of any probability distribution 2 similar to on {X}. Since T is a maximal separating set, it follows that Z E Y X. Since and 2 are similar on X, Y S X X, and Z EY \ S X, we have that Y SX 2 and Z E Y X 2 \ S X 2. Hence, there is a (maximal separating) set T 2 MB 2 (X) not including Z, such that X 2 Y T 2 and X 2 Y T 2 {Z}. The result then follows from Lemma 8. Lemma 13. Let be a DAG faithful probability distribution over V, and X,Y,Z V. Then for any DAG G, which is a perfect map of, if (X,Z,Y ) 17

18 is a v-structure in G, then Y S X X, Z EY \S X, and X Z and Y Z are both locally compelled. roof. We only show that X Z is locally compelled, as the case of Y Z is similar. Since Y is not adjacent to X, Corollary 6 implies that Y S X, and hence that E Y X is well-defined. Clearly, Z cannot be in any maximal seperating set of Y from X wrt., as G is a perfect map of and X G Y T {Z} for any set T. Furthermore, as Z is adjacent to X, Corollary 6 ensures that Z / S X. The result now follows from Lemma 12. From Definition 11, Corollary 10, and Lemma 13, we thus have: Corollary 14. For any DAG G and any arc participating in a v-structure, we have that this arc is locally compelled in G. Furthermore, we have that each locally compelled arc in G is compelled in G. At this point, we have that it is sufficient to require of an algorithm uncovering G, for some DAG faithful distribution, that it identifies the skeleton of G, identifies all locally compelled arcs, directs them, and then eliminates arcs not participating in v-structures. Moreover, any member of G s equivalence class can be constructed by identifying all locally compelled arcs and then directing the remaining arcs without introducing cycles or new v-structures. Formally, we can restate the corollary in terms that emphasize the connection between similarity and equivalence: Corollary 15. Let G 1 = (V,E 1 ) and G 2 = (V,E 2 ) be DAGs. Then G 1 is similar to G 2 on V iff G 1 is equivalent to G 2. Now, we proceed by asserting that the arcs directed by our method are indeed locally compelled, and that no locally compelled arcs are left undirected. From Lemmas 9 and 12 we immediately get: Corollary 16. Let be a DAG faithful probability distribution over V, X V, Y S X, T MB (X)\{Y }, and Z MB (X)\(T S X {Y }). If X Y T and X Y T {Z}, then for any DAG G, which is a perfect map of, Z is in CH G (X). We thus have that all arcs directed by artiallydirect( ) are locally compelled. That a arc X Y participating in a v-structure (X, Y, Z) will be found by artiallydirect( ), follows from Lemma

19 Lemma 17. Let be a DAG faithful probability distribution over V, X,Y V, and G be a perfect map of. There is a W S X such that Y E W X \ S X iff there is Z MB G(X) S X such that Y is a descendent of Z through a path encompassing only children of X. roof. The if part follows trivially from the d-separation properties of graphs, and we therefore only prove the only if part: Since Y E W X \S X we have from Lemma 12 that Y is a child of X in G. Moreover, we know that there is a set T MB (X) = MB G (X) such that X G W T, but X G W T {Y }, and thus by Lemma 7 that in G there is a node C with non-adjacent parents A and B, such that Z is a descendent of C on path c, and there are active paths a and b given T between X and A and between B and W. Let V be the first node on c following Y. As V is also a descendent of C it follows that it is also in E W X. If V is furthermore in S X then we have found the node Z that we are looking for, so assume that this is not the case. Then Lemma 12 can be applied again with V taking Y s place. Iterating in this manner we either find the Z or we get that each node on c must be a child of X. Specifically this holds for C. But then either B is the node Z we are looking for, or there is an active path from X to W given T, consisting of the connection between X and B along with b. The latter possibility is precluded by the assumption X G W T. A direct consequence of this graphical result and d-separation properties on graphs is: Corollary 18. Let be a DAG faithful probability distribution over V, X,Y V, and G be a perfect map of. There is a W S X such that Y E W X \ S X iff there is Z SX MB (X) such that Y E Z X \ S X. Lemma 19. Let be a DAG faithful probability distribution over V, X,Y V, and G be a perfect map of. We then have that Y CH G (X) iff there is a W S X X such that Y EW \ S X. roof. The if part follows immediately from Lemma 12. We therefore only treat the only if part: Since X Y is an arc in G it follows from Corollary 6 that Y / S X W X, so what we need to prove is that Y W for some W V. We do so by contradiction, assuming that Y is not in any such E W X, and showing that any distribution 2, having the graph G 2 resulting from reversing X Y as its perfect map, is similar to on X, and thus that X Y cannot be locally compelled. 19

20 We assume that 2 is not similar to on X, and hence that at least one of oints 1 to 3 in Definition 2 is violated. First we note that it cannot be the case that there is a v-structure X,Y,Z, as Y would then be in E Z X by Lemma 13. Hence, MB G (X) = MB G2 (X) and therefore MB (X) = MB 2 (X), so oint 1 is satisfied. Additionally, Corollary 6 means that the fact that no adjacencies are changed implies that oint 2 is satisfied. So it must be oint 3 that is violated, meaning that for some node W we have that E W X \S X \S X 2. Let Z be a node having either left or entered such an enabling set due to the arc reversal. First note that the assumption that Y is in no such sets means that if Z = Y, Y must have entered such a set, but then Lemma 12 states that X Y is a compelled arc in G 2 which contradicts the definition of G 2. Thus Z Y. We prove this is impossible, too, reaching the required contradiction. X EW 2 First assume that Z E W X but Z / E W X 2. This means that there is a set T MB (X) such that X G W T, but X G W T {Z}, and by Lemma 7 that in G there is a node C with non-adjacent parents A and B, such that Z is a descendent of C on path c, and there are active paths a and b given T between X and A and between B and W, respectively, but the reversal of X Y somehow destroys this setup. Clearly, X Y cannot be on c since that would mean that X G W T, and X Y cannot be on b, as that would either mean X G W T or require Y to participate in a v-structure at C, which we have established cannot be the case. Thus X Y is on a, and WLOG we take it to be the first arc of a. If the next arc on a is an arc out of Y, then exchanging X Y for X Y changes the connection at Y from being serial into being diverging, and then a is still active. Therefore the connection at Y must be converging. Since Y does not participate in a v-structure, X must be adjacent to the next node V on a. The connection between X and V must be directed towards V as otherwise there is guaranteed to be an active path from X to C without the help of X Y. We consider the two cases for the arc V U following Y V on a: If this is an arc out of V, then X V provides an active path between X and C, so we know this cannot be the case. So the arc is oriented towards V. Now V cannot itself participate in any v-structures, since by Lemma 17 that would imply that Y E R X for some R, which we have assumed is not the case. So U is also adjacent to X, and by repeating the previous arguments we continue to establish that X must be a parent of all nodes on a, including A. But then there is an active path X A C Bb between X and W without the help of X Y, and we conclude that Z cannot have left an enabling set due to the arc reversal. 20

21 Finally, assume that Z / E W X and Z E W X 2. As before Lemma 7 guarantee that we have the structure of paths a, b, and c along with node Z and set T, but this time active in G 2 and not in G. We know that X Y cannot be part of b as either that would mean X G2 W T or that X is required to participate in a v-structure and be part of T, defying the definition of T. Moreover, X Y cannot be on c, as that would either imply that X G2 W T or that X G2 W T {Y }, the latter being precluded by Corollary 16 and the definition of G 2 as a DAG where Y is a parent of X. So X Y must be on a, and we asumme WLOG that it is the first arc on a. Since reversing X Y breaks the setup we must have that the arc following it on a cannot be directed out of Y. Let V be the node connected to Y like this. Since Y does not participate in a v-structure in G, and only the direction of the link X Y differs between the two, this means that X must also be adjacent to V in G. As above we get that V, and all other nodes on a must be children of X, ending up with the active path X A C Bb between X and W contradicting the assumption that Z / E W X. Summing up we have: Corollary 20. Let be a DAG faithful probability distribution over V, G a perfect map of, and X Y an arc in G. Then the following are equivalent statements: X Y is locally compelled there is a W S X such that Y EW X \ S X, there is a W S X MB (X) such that Y E W X \ S X, there is a W V and T MB (X) such that X W T and X W T {Y }, and there is a W MB G (X) S X such that there is a directed path from W to Y in G running through only nodes in CH G (X). Thus there are strong correlations between the abstract probabilistic definition of locally compelled, the conditional independence oriented definition of enabling sets, and strictly graphical notions. This allows us to finally state the proof of Theorem 3: Theorem 21. Let BN B 1 (G 1,Φ 1 ) and DAG faithful probability distributions 1 and 2 each be defined over variables V, and h be a DAG faithful 21

22 sample sequence of size 2 and rank r, where 1 is the distribution of the first sample and 2 is the distribution of the last sample. Furthermore, let 1 be similar to 2 on S, and B 2 (G 2,Φ 2 ) be the result of running Algorithm 1 on B 1 with cases h and some choice of k. If 1. G 1 is equivalent to G 1, 2. r > k, 3. ChangeInStream(, k) is true for a variable X iff X / S and the algorithm is currently processing the k th sample drawn from 2, 4. UncoverMarkovBoundary(X, h) returns MB h (X), 5. G X returned from UncoverAdjacencies(X, M X, h) connects X to Y iff Y / S X h, and 6. the oracle used in artiallydirect( ) is correct, then G 2 is equivalent to G 2. roof. Given Assumptions 2 and 3 it is obvious that Adapt( ) does nothing except exactly once, when it calls UpdateNet( ) on B 1, V \S, and the first k cases h k of the partition of h sampled from 2. According to Corollary 14 if we can prove that i) two nodes are adjacent in G 2 iff they are adjacent in G 2, and ii) an arc is put in G 2 by artially- Direct( ) or MergeAndDirect( ) iff it is a locally compelled arc in G 2 then we have that G 2 is identical to G 2. The result then follows from the correction of Rules 1 to 3 on page 10, which have been proved in (Spirtes, Glymour, and Scheines 2001). i) Let X,Y V be adjacent in G 2. It is either the case that both of X and Y are in S or not. In the latter case we assume WLOG that X S. Then X and Y have been set adjacent by UncoverAdjacencies( ), and by Assumption 5 this means that Y / S X h k. Since h is piecewise DAG faithful, and h k is sampled solely from the partition generated by 2, it follows that Y / S X 2, and by Corollary 6 that X and Y are adjacent in G 2. In the first case, X and Y are adjacent in G 2 iff they are so in G 1, so we have that X and Y are adjacent in G 1. By Corollary 6 and Assumption 1 this means that Y S X 1. Since 1 and 2 are similar on S and thus X, it follows that Y S X 2, and by Corollary 6 that X and Y are adjacent in G 2. roving that if X and Y are non-adjacent in G 2 then they are so also in G 2 follows the same reasoning. 22

23 ii) We consider arcs emanating from a node X in G 2. First, let X S. This means that the locally compelled arcs emanating from X in G 2 are exactly those in G 1. By Corollary 15 and Assumption 1, these are again the same as those in G 1, and by Corollary 20 these are exactly those created by MergeAndDirect( ). Thus CH G 2 (X) = CH G 2 (X). Assume then that X / S. This means that the arcs out of X are created by artiallydirect( ) and only by this method. Due to Assumptions 4 and 5 this method sets Y as a child of X iff it can find a variable Z and a set T MB hk (X) such that I hk (X,Z T) and I hk (X,Z T {Y }). By Assumption 6 this means that Y is set as a child of X iff X hk Z T and X hk Z T {Y }. Since h k is a representative sample from 2, it follows that Y is set as a child of X iff X 2 Z T and X 2 Z T {Y }, and hence by Corollary 20 that Y is set as a child of X in G 2 iff X Y is a locally compelled arc in G 2. This proves the result. 5 Experiments and Results To investigate how our method behaves in practice, we ran a series of experiments with a fully implemented version of the method. In our implementation we have used the conflict measure of Jensen, Chamberlain, Nordahl, and Jensen (1991) in the method ConflictMeasure(B, X i, (x 1,...,x m )): log E (X i = x i ) B (X i = x i X j = x j j i), where the empty BN structure E is adapted to the observations using fractional updating. For ChangeInStream(c X, k) we calculated the second discrete cosine transform component (see e.g. (ress, Teukolsky, Vetterling, and Flannery 2002)) of the last 2k measures in c X, and had ChangeIn- Stream( ) return true if this statistic exceeded UncoverMarkovBoundary( ) was implemented as the decision tree learning method of (Frey, Fisher, Tsamardinos, Aliferis, and Statnikov 2003), and UncoverAdjacencies(X, M X, h) corresponds to the AlgorithmCD( ) method of (eña, Björkegren, and Tegnér 2005) restricted to the variables in M X. This method uses a greedy method for iteratively growing and shrinking the estimated set of adjacent variables until no further change takes place. Both of these methods need an independence oracle, and for that we have used a χ 2 test. This test was also used as oracle in artiallydirect( ), 4 We are unaware of any previous work using this kind of change point detection, but in our setting it outperformed the traditional methods of log-odds ratios and t-tests. 23

24 We constructed 100 experiments, where each consisted five randomly generated BNs over ten variables, B 1,...,B 5. We made sure that B i was structurally similar to B i 1 except for the connection between two randomly chosen nodes. All CTs in B i were kept the same as in B i 1, except for the nodes with a new parent set. For these we employed four different methods for generating new distributions: A estimated the probabilities from the previous network with some added noise to ensure that no two distributions were the same. B to D generated entirely new CTs, with B being totally random, C being restricted to CTs that implied strong dependencies between the parent variables and the child, and D being restricted to only those CTs where these dependencies were extremely strong. For generation of the initial BNs we used the method of (Ide, Cozman, and Ramos 2004). For each series of five BNs, we sampled r cases from each network and concatenated them into a piecewise DAG faithful sample sequence of rank r and length 5r, for r being 500, 1000, and We fed our method (NN) with the generated sequences, using different values of k (100, 500, and 1000), and measured the precision wrt. the KLdistance on each. As mentioned we are unaware of other work geared towards non-stationary distributions, but for base-line comparison purposes, we implemented the structural adaptation methods of Friedman and Goldszmidt (1997) (FG) and Lam and Bacchus (1994) (LB), which serve as representatives of the state of the art. For the method of Friedman and Goldszmidt (1997) we tried both simulated annealing (FG-SA) and a more time consuming hill-climbing (FG-HC) for the unspecified search step of the algorithm. As these methods have not been developed to deal with non-stationary distributions, they have to be told the delay between learning. For this we used the same value k, that we use as allowed delay for our own method, as this ensure that all methods stores only a maximum of k full cases at any one time. Each method was given the correct initial network to start its exploration. Space does not permit us to present the results in full, but the precision of both NN, FG-SA, and FG-HC are presented in Figures 1 and 2. NN outperformed FG-SA in 81 of the experiments, and FG-HC in 65 of the experiments. The precision of the LB method was much worse than for either of these three. The one experiment, where both the FG methods outperformed the NN method substantially, had r equal to and k equal to 5000, and was thus the experiment closest to the assumption on stationary distributions of the FG and LB learners. What is interesting is that another experiment having similar r and k values, but generated with method B rather than D, gave the exact opposite result (NN being much 24

COMP538: Introduction to Bayesian Networks

COMP538: Introduction to Bayesian Networks COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology

More information

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables N-1 Experiments Suffice to Determine the Causal Relations Among N Variables Frederick Eberhardt Clark Glymour 1 Richard Scheines Carnegie Mellon University Abstract By combining experimental interventions

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Identifiability assumptions for directed graphical models with feedback

Identifiability assumptions for directed graphical models with feedback Biometrika, pp. 1 26 C 212 Biometrika Trust Printed in Great Britain Identifiability assumptions for directed graphical models with feedback BY GUNWOONG PARK Department of Statistics, University of Wisconsin-Madison,

More information

Arrowhead completeness from minimal conditional independencies

Arrowhead completeness from minimal conditional independencies Arrowhead completeness from minimal conditional independencies Tom Claassen, Tom Heskes Radboud University Nijmegen The Netherlands {tomc,tomh}@cs.ru.nl Abstract We present two inference rules, based on

More information

Markov Independence (Continued)

Markov Independence (Continued) Markov Independence (Continued) As an Undirected Graph Basic idea: Each variable V is represented as a vertex in an undirected graph G = (V(G), E(G)), with set of vertices V(G) and set of edges E(G) the

More information

Learning Multivariate Regression Chain Graphs under Faithfulness

Learning Multivariate Regression Chain Graphs under Faithfulness Sixth European Workshop on Probabilistic Graphical Models, Granada, Spain, 2012 Learning Multivariate Regression Chain Graphs under Faithfulness Dag Sonntag ADIT, IDA, Linköping University, Sweden dag.sonntag@liu.se

More information

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges 1 PRELIMINARIES Two vertices X i and X j are adjacent if there is an edge between them. A path

More information

Tree sets. Reinhard Diestel

Tree sets. Reinhard Diestel 1 Tree sets Reinhard Diestel Abstract We study an abstract notion of tree structure which generalizes treedecompositions of graphs and matroids. Unlike tree-decompositions, which are too closely linked

More information

On improving matchings in trees, via bounded-length augmentations 1

On improving matchings in trees, via bounded-length augmentations 1 On improving matchings in trees, via bounded-length augmentations 1 Julien Bensmail a, Valentin Garnero a, Nicolas Nisse a a Université Côte d Azur, CNRS, Inria, I3S, France Abstract Due to a classical

More information

Bayes Nets: Independence

Bayes Nets: Independence Bayes Nets: Independence [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Bayes Nets A Bayes

More information

Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach

Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach Stijn Meganck 1, Philippe Leray 2, and Bernard Manderick 1 1 Vrije Universiteit Brussel, Pleinlaan 2,

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Belief Update in CLG Bayesian Networks With Lazy Propagation

Belief Update in CLG Bayesian Networks With Lazy Propagation Belief Update in CLG Bayesian Networks With Lazy Propagation Anders L Madsen HUGIN Expert A/S Gasværksvej 5 9000 Aalborg, Denmark Anders.L.Madsen@hugin.com Abstract In recent years Bayesian networks (BNs)

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models, Spring 2015 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Yi Cheng, Cong Lu 1 Notation Here the notations used in this course are defined:

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Bayes Nets: Independence Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]

More information

Learning Marginal AMP Chain Graphs under Faithfulness

Learning Marginal AMP Chain Graphs under Faithfulness Learning Marginal AMP Chain Graphs under Faithfulness Jose M. Peña ADIT, IDA, Linköping University, SE-58183 Linköping, Sweden jose.m.pena@liu.se Abstract. Marginal AMP chain graphs are a recently introduced

More information

Causality in Econometrics (3)

Causality in Econometrics (3) Graphical Causal Models References Causality in Econometrics (3) Alessio Moneta Max Planck Institute of Economics Jena moneta@econ.mpg.de 26 April 2011 GSBC Lecture Friedrich-Schiller-Universität Jena

More information

Probabilistic Models. Models describe how (a portion of) the world works

Probabilistic Models. Models describe how (a portion of) the world works Probabilistic Models Models describe how (a portion of) the world works Models are always simplifications May not account for every variable May not account for all interactions between variables All models

More information

1 Some loose ends from last time

1 Some loose ends from last time Cornell University, Fall 2010 CS 6820: Algorithms Lecture notes: Kruskal s and Borůvka s MST algorithms September 20, 2010 1 Some loose ends from last time 1.1 A lemma concerning greedy algorithms and

More information

CS Lecture 3. More Bayesian Networks

CS Lecture 3. More Bayesian Networks CS 6347 Lecture 3 More Bayesian Networks Recap Last time: Complexity challenges Representing distributions Computing probabilities/doing inference Introduction to Bayesian networks Today: D-separation,

More information

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks) (Bayesian Networks) Undirected Graphical Models 2: Use d-separation to read off independencies in a Bayesian network Takes a bit of effort! 1 2 (Markov networks) Use separation to determine independencies

More information

Artificial Intelligence Bayes Nets: Independence

Artificial Intelligence Bayes Nets: Independence Artificial Intelligence Bayes Nets: Independence Instructors: David Suter and Qince Li Course Delivered @ Harbin Institute of Technology [Many slides adapted from those created by Dan Klein and Pieter

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Bayesian Networks to design optimal experiments. Davide De March

Bayesian Networks to design optimal experiments. Davide De March Bayesian Networks to design optimal experiments Davide De March davidedemarch@gmail.com 1 Outline evolutionary experimental design in high-dimensional space and costly experimentation the microwell mixture

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Recitation 3 1 Gaussian Graphical Models: Schur s Complement Consider

More information

Local Structure Discovery in Bayesian Networks

Local Structure Discovery in Bayesian Networks 1 / 45 HELSINGIN YLIOPISTO HELSINGFORS UNIVERSITET UNIVERSITY OF HELSINKI Local Structure Discovery in Bayesian Networks Teppo Niinimäki, Pekka Parviainen August 18, 2012 University of Helsinki Department

More information

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence General overview Introduction Directed acyclic graphs (DAGs) and conditional independence DAGs and causal effects

More information

Realization Plans for Extensive Form Games without Perfect Recall

Realization Plans for Extensive Form Games without Perfect Recall Realization Plans for Extensive Form Games without Perfect Recall Richard E. Stearns Department of Computer Science University at Albany - SUNY Albany, NY 12222 April 13, 2015 Abstract Given a game in

More information

A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models

A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models Y. Xiang University of Guelph, Canada Abstract Discovery of graphical models is NP-hard in general, which justifies using

More information

PCSI-labeled Directed Acyclic Graphs

PCSI-labeled Directed Acyclic Graphs Date of acceptance 7.4.2014 Grade Instructor Eximia cum laude approbatur Jukka Corander PCSI-labeled Directed Acyclic Graphs Jarno Lintusaari Helsinki April 8, 2014 UNIVERSITY OF HELSINKI Department of

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS 188 Fall 2015 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

More information

Branch-and-Bound for the Travelling Salesman Problem

Branch-and-Bound for the Travelling Salesman Problem Branch-and-Bound for the Travelling Salesman Problem Leo Liberti LIX, École Polytechnique, F-91128 Palaiseau, France Email:liberti@lix.polytechnique.fr March 15, 2011 Contents 1 The setting 1 1.1 Graphs...............................................

More information

Bounding the False Discovery Rate in Local Bayesian Network Learning

Bounding the False Discovery Rate in Local Bayesian Network Learning Bounding the False Discovery Rate in Local Bayesian Network Learning Ioannis Tsamardinos Dept. of Computer Science, Univ. of Crete, Greece BMI, ICS, Foundation for Research and Technology, Hellas Dept.

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Bayesian Networks: Representation, Variable Elimination

Bayesian Networks: Representation, Variable Elimination Bayesian Networks: Representation, Variable Elimination CS 6375: Machine Learning Class Notes Instructor: Vibhav Gogate The University of Texas at Dallas We can view a Bayesian network as a compact representation

More information

Learning causal network structure from multiple (in)dependence models

Learning causal network structure from multiple (in)dependence models Learning causal network structure from multiple (in)dependence models Tom Claassen Radboud University, Nijmegen tomc@cs.ru.nl Abstract Tom Heskes Radboud University, Nijmegen tomh@cs.ru.nl We tackle the

More information

PROBABILISTIC REASONING SYSTEMS

PROBABILISTIC REASONING SYSTEMS PROBABILISTIC REASONING SYSTEMS In which we explain how to build reasoning systems that use network models to reason with uncertainty according to the laws of probability theory. Outline Knowledge in uncertain

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this

More information

Faithfulness of Probability Distributions and Graphs

Faithfulness of Probability Distributions and Graphs Journal of Machine Learning Research 18 (2017) 1-29 Submitted 5/17; Revised 11/17; Published 12/17 Faithfulness of Probability Distributions and Graphs Kayvan Sadeghi Statistical Laboratory University

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Heuristic Search Algorithms

Heuristic Search Algorithms CHAPTER 4 Heuristic Search Algorithms 59 4.1 HEURISTIC SEARCH AND SSP MDPS The methods we explored in the previous chapter have a serious practical drawback the amount of memory they require is proportional

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

Planning With Information States: A Survey Term Project for cs397sml Spring 2002

Planning With Information States: A Survey Term Project for cs397sml Spring 2002 Planning With Information States: A Survey Term Project for cs397sml Spring 2002 Jason O Kane jokane@uiuc.edu April 18, 2003 1 Introduction Classical planning generally depends on the assumption that the

More information

k-blocks: a connectivity invariant for graphs

k-blocks: a connectivity invariant for graphs 1 k-blocks: a connectivity invariant for graphs J. Carmesin R. Diestel M. Hamann F. Hundertmark June 17, 2014 Abstract A k-block in a graph G is a maximal set of at least k vertices no two of which can

More information

arxiv: v2 [cs.lg] 9 Mar 2017

arxiv: v2 [cs.lg] 9 Mar 2017 Journal of Machine Learning Research? (????)??-?? Submitted?/??; Published?/?? Joint Causal Inference from Observational and Experimental Datasets arxiv:1611.10351v2 [cs.lg] 9 Mar 2017 Sara Magliacane

More information

ON COST MATRICES WITH TWO AND THREE DISTINCT VALUES OF HAMILTONIAN PATHS AND CYCLES

ON COST MATRICES WITH TWO AND THREE DISTINCT VALUES OF HAMILTONIAN PATHS AND CYCLES ON COST MATRICES WITH TWO AND THREE DISTINCT VALUES OF HAMILTONIAN PATHS AND CYCLES SANTOSH N. KABADI AND ABRAHAM P. PUNNEN Abstract. Polynomially testable characterization of cost matrices associated

More information

Graph Theory. Thomas Bloom. February 6, 2015

Graph Theory. Thomas Bloom. February 6, 2015 Graph Theory Thomas Bloom February 6, 2015 1 Lecture 1 Introduction A graph (for the purposes of these lectures) is a finite set of vertices, some of which are connected by a single edge. Most importantly,

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

On the errors introduced by the naive Bayes independence assumption

On the errors introduced by the naive Bayes independence assumption On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of

More information

5 Flows and cuts in digraphs

5 Flows and cuts in digraphs 5 Flows and cuts in digraphs Recall that a digraph or network is a pair G = (V, E) where V is a set and E is a multiset of ordered pairs of elements of V, which we refer to as arcs. Note that two vertices

More information

Inference of A Minimum Size Boolean Function by Using A New Efficient Branch-and-Bound Approach From Examples

Inference of A Minimum Size Boolean Function by Using A New Efficient Branch-and-Bound Approach From Examples Published in: Journal of Global Optimization, 5, pp. 69-9, 199. Inference of A Minimum Size Boolean Function by Using A New Efficient Branch-and-Bound Approach From Examples Evangelos Triantaphyllou Assistant

More information

4.1 Notation and probability review

4.1 Notation and probability review Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall

More information

Part I Qualitative Probabilistic Networks

Part I Qualitative Probabilistic Networks Part I Qualitative Probabilistic Networks In which we study enhancements of the framework of qualitative probabilistic networks. Qualitative probabilistic networks allow for studying the reasoning behaviour

More information

Conditional Independence and Factorization

Conditional Independence and Factorization Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Learning the Structure of Linear Latent Variable Models

Learning the Structure of Linear Latent Variable Models Journal (2005) - Submitted /; Published / Learning the Structure of Linear Latent Variable Models Ricardo Silva Center for Automated Learning and Discovery School of Computer Science rbas@cs.cmu.edu Richard

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution NETWORK ANALYSIS Lourens Waldorp PROBABILITY AND GRAPHS The objective is to obtain a correspondence between the intuitive pictures (graphs) of variables of interest and the probability distributions of

More information

Monotone DAG Faithfulness: A Bad Assumption

Monotone DAG Faithfulness: A Bad Assumption Monotone DAG Faithfulness: A Bad Assumption David Maxwell Chickering Christopher Meek Technical Report MSR-TR-2003-16 Microsoft Research Microsoft Corporation One Microsoft Way Redmond, WA 98052 Abstract

More information

Out-colourings of Digraphs

Out-colourings of Digraphs Out-colourings of Digraphs N. Alon J. Bang-Jensen S. Bessy July 13, 2017 Abstract We study vertex colourings of digraphs so that no out-neighbourhood is monochromatic and call such a colouring an out-colouring.

More information

Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks

Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks Alessandro Antonucci Marco Zaffalon IDSIA, Istituto Dalle Molle di Studi sull

More information

Learning Semi-Markovian Causal Models using Experiments

Learning Semi-Markovian Causal Models using Experiments Learning Semi-Markovian Causal Models using Experiments Stijn Meganck 1, Sam Maes 2, Philippe Leray 2 and Bernard Manderick 1 1 CoMo Vrije Universiteit Brussel Brussels, Belgium 2 LITIS INSA Rouen St.

More information

4-coloring P 6 -free graphs with no induced 5-cycles

4-coloring P 6 -free graphs with no induced 5-cycles 4-coloring P 6 -free graphs with no induced 5-cycles Maria Chudnovsky Department of Mathematics, Princeton University 68 Washington Rd, Princeton NJ 08544, USA mchudnov@math.princeton.edu Peter Maceli,

More information

Analysis of Algorithms. Outline 1 Introduction Basic Definitions Ordered Trees. Fibonacci Heaps. Andres Mendez-Vazquez. October 29, Notes.

Analysis of Algorithms. Outline 1 Introduction Basic Definitions Ordered Trees. Fibonacci Heaps. Andres Mendez-Vazquez. October 29, Notes. Analysis of Algorithms Fibonacci Heaps Andres Mendez-Vazquez October 29, 2015 1 / 119 Outline 1 Introduction Basic Definitions Ordered Trees 2 Binomial Trees Example 3 Fibonacci Heap Operations Fibonacci

More information

Towards an extension of the PC algorithm to local context-specific independencies detection

Towards an extension of the PC algorithm to local context-specific independencies detection Towards an extension of the PC algorithm to local context-specific independencies detection Feb-09-2016 Outline Background: Bayesian Networks The PC algorithm Context-specific independence: from DAGs to

More information

Graph coloring, perfect graphs

Graph coloring, perfect graphs Lecture 5 (05.04.2013) Graph coloring, perfect graphs Scribe: Tomasz Kociumaka Lecturer: Marcin Pilipczuk 1 Introduction to graph coloring Definition 1. Let G be a simple undirected graph and k a positive

More information

Before we show how languages can be proven not regular, first, how would we show a language is regular?

Before we show how languages can be proven not regular, first, how would we show a language is regular? CS35 Proving Languages not to be Regular Before we show how languages can be proven not regular, first, how would we show a language is regular? Although regular languages and automata are quite powerful

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

Model Complexity of Pseudo-independent Models

Model Complexity of Pseudo-independent Models Model Complexity of Pseudo-independent Models Jae-Hyuck Lee and Yang Xiang Department of Computing and Information Science University of Guelph, Guelph, Canada {jaehyuck, yxiang}@cis.uoguelph,ca Abstract

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Proving languages to be nonregular

Proving languages to be nonregular Proving languages to be nonregular We already know that there exist languages A Σ that are nonregular, for any choice of an alphabet Σ. This is because there are uncountably many languages in total and

More information

Directed Graphical Models

Directed Graphical Models CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential

More information

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria 12. LOCAL SEARCH gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley h ttp://www.cs.princeton.edu/~wayne/kleinberg-tardos

More information

Strongly chordal and chordal bipartite graphs are sandwich monotone

Strongly chordal and chordal bipartite graphs are sandwich monotone Strongly chordal and chordal bipartite graphs are sandwich monotone Pinar Heggernes Federico Mancini Charis Papadopoulos R. Sritharan Abstract A graph class is sandwich monotone if, for every pair of its

More information

Enhancing Active Automata Learning by a User Log Based Metric

Enhancing Active Automata Learning by a User Log Based Metric Master Thesis Computing Science Radboud University Enhancing Active Automata Learning by a User Log Based Metric Author Petra van den Bos First Supervisor prof. dr. Frits W. Vaandrager Second Supervisor

More information

Exact and Approximate Equilibria for Optimal Group Network Formation

Exact and Approximate Equilibria for Optimal Group Network Formation Exact and Approximate Equilibria for Optimal Group Network Formation Elliot Anshelevich and Bugra Caskurlu Computer Science Department, RPI, 110 8th Street, Troy, NY 12180 {eanshel,caskub}@cs.rpi.edu Abstract.

More information

P is the class of problems for which there are algorithms that solve the problem in time O(n k ) for some constant k.

P is the class of problems for which there are algorithms that solve the problem in time O(n k ) for some constant k. Complexity Theory Problems are divided into complexity classes. Informally: So far in this course, almost all algorithms had polynomial running time, i.e., on inputs of size n, worst-case running time

More information

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas Bayesian Networks Vibhav Gogate The University of Texas at Dallas Intro to AI (CS 6364) Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell or Andrew Moore 1 Outline

More information

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features Eun Yong Kang Department

More information

Packing and Covering Dense Graphs

Packing and Covering Dense Graphs Packing and Covering Dense Graphs Noga Alon Yair Caro Raphael Yuster Abstract Let d be a positive integer. A graph G is called d-divisible if d divides the degree of each vertex of G. G is called nowhere

More information

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2 TDT70: Uncertainty in Artificial Intelligence Chapter 1 and 2 Fundamentals of probability theory The sample space is the set of possible outcomes of an experiment. A subset of a sample space is called

More information

K 4 -free graphs with no odd holes

K 4 -free graphs with no odd holes K 4 -free graphs with no odd holes Maria Chudnovsky 1 Columbia University, New York NY 10027 Neil Robertson 2 Ohio State University, Columbus, Ohio 43210 Paul Seymour 3 Princeton University, Princeton

More information

arxiv: v4 [math.st] 19 Jun 2018

arxiv: v4 [math.st] 19 Jun 2018 Complete Graphical Characterization and Construction of Adjustment Sets in Markov Equivalence Classes of Ancestral Graphs Emilija Perković, Johannes Textor, Markus Kalisch and Marloes H. Maathuis arxiv:1606.06903v4

More information

Generating p-extremal graphs

Generating p-extremal graphs Generating p-extremal graphs Derrick Stolee Department of Mathematics Department of Computer Science University of Nebraska Lincoln s-dstolee1@math.unl.edu August 2, 2011 Abstract Let f(n, p be the maximum

More information

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht Recall: PAC Learning (Version 1) A hypothesis class H is PAC learnable

More information

The Impossibility of Certain Types of Carmichael Numbers

The Impossibility of Certain Types of Carmichael Numbers The Impossibility of Certain Types of Carmichael Numbers Thomas Wright Abstract This paper proves that if a Carmichael number is composed of primes p i, then the LCM of the p i 1 s can never be of the

More information

The Complexity of Maximum. Matroid-Greedoid Intersection and. Weighted Greedoid Maximization

The Complexity of Maximum. Matroid-Greedoid Intersection and. Weighted Greedoid Maximization Department of Computer Science Series of Publications C Report C-2004-2 The Complexity of Maximum Matroid-Greedoid Intersection and Weighted Greedoid Maximization Taneli Mielikäinen Esko Ukkonen University

More information

A MODEL-THEORETIC PROOF OF HILBERT S NULLSTELLENSATZ

A MODEL-THEORETIC PROOF OF HILBERT S NULLSTELLENSATZ A MODEL-THEORETIC PROOF OF HILBERT S NULLSTELLENSATZ NICOLAS FORD Abstract. The goal of this paper is to present a proof of the Nullstellensatz using tools from a branch of logic called model theory. In

More information

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas Bayesian Networks Vibhav Gogate The University of Texas at Dallas Intro to AI (CS 4365) Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell or Andrew Moore 1 Outline

More information

Sergey Norin Department of Mathematics and Statistics McGill University Montreal, Quebec H3A 2K6, Canada. and

Sergey Norin Department of Mathematics and Statistics McGill University Montreal, Quebec H3A 2K6, Canada. and NON-PLANAR EXTENSIONS OF SUBDIVISIONS OF PLANAR GRAPHS Sergey Norin Department of Mathematics and Statistics McGill University Montreal, Quebec H3A 2K6, Canada and Robin Thomas 1 School of Mathematics

More information

Algebraic Methods in Combinatorics

Algebraic Methods in Combinatorics Algebraic Methods in Combinatorics Po-Shen Loh 27 June 2008 1 Warm-up 1. (A result of Bourbaki on finite geometries, from Răzvan) Let X be a finite set, and let F be a family of distinct proper subsets

More information

Bichain graphs: geometric model and universal graphs

Bichain graphs: geometric model and universal graphs Bichain graphs: geometric model and universal graphs Robert Brignall a,1, Vadim V. Lozin b,, Juraj Stacho b, a Department of Mathematics and Statistics, The Open University, Milton Keynes MK7 6AA, United

More information

THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS

THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS Proceedings of the 00 Winter Simulation Conference E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, eds. THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION

More information

CS 350 Algorithms and Complexity

CS 350 Algorithms and Complexity 1 CS 350 Algorithms and Complexity Fall 2015 Lecture 15: Limitations of Algorithmic Power Introduction to complexity theory Andrew P. Black Department of Computer Science Portland State University Lower

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types

More information

Markov properties for mixed graphs

Markov properties for mixed graphs Bernoulli 20(2), 2014, 676 696 DOI: 10.3150/12-BEJ502 Markov properties for mixed graphs KAYVAN SADEGHI 1 and STEFFEN LAURITZEN 2 1 Department of Statistics, Baker Hall, Carnegie Mellon University, Pittsburgh,

More information

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car CSE 573: Artificial Intelligence Autumn 2012 Bayesian Networks Dan Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer Outline Probabilistic models (and inference)

More information

Noisy-OR Models with Latent Confounding

Noisy-OR Models with Latent Confounding Noisy-OR Models with Latent Confounding Antti Hyttinen HIIT & Dept. of Computer Science University of Helsinki Finland Frederick Eberhardt Dept. of Philosophy Washington University in St. Louis Missouri,

More information