CAUSAL MODELS: THE MEANINGFUL INFORMATION OF PROBABILITY DISTRIBUTIONS

Size: px
Start display at page:

Download "CAUSAL MODELS: THE MEANINGFUL INFORMATION OF PROBABILITY DISTRIBUTIONS"

Transcription

1 CAUSAL MODELS: THE MEANINGFUL INFORMATION OF PROBABILITY DISTRIBUTIONS Jan Lemeire, Erik Dirkx ETRO Dept., Vrije Universiteit Brussel Pleinlaan 2, 1050 Brussels, Belgium ABSTRACT This paper claims that causal model theory describes the meaningful information of probability distributions after a factorization. If the minimal factorization of a distribution is incompressible, its Kolmogorov minimal sufficient statistics, the parents lists, can be represented by a directed acyclic graph (DAG). We showed that a faithful Bayesian network is a minimal factorization and that a Bayesian network with random and unrelated conditional probability distributions (CPDs) is faithful and thus a minimal factorization. The validity of faithfulness depends on the presence of other regularities. The Bayesian network is a canonical representation, it uniquely decomposes the distribution into independent submodels, the CPDs. In absence of further information, we may assume modularity and that the model offers a good hypothesis about the underlying mechanisms of the system. KEY WORDS Causal Models, Kolmogorov Complexity, Meaningful Information, Reductionism. 1 Introduction Kolmogorov Complexity gives an objective measure of the complexity of an object, which allows the formal application of Occam s Razor to modeling. The central idea is that modeling can be equated with finding regularities in data. An objective property of a regularity is identified by its ability to compress the data, i.e. to describe the data using fewer symbols than the number of symbols needed to describe the data literally. The more regularities there are, the more the data can be compressed. The regularities of the data constitute the meaningful information. A good model only captures the regularities, not the accidental, random information of the data. The simplest model that does so is called the Kolmogorov minimal sufficient statistics for the data. The theory of causal models, as developed by Pearl, Verma, Spirtes, Glymour, Scheines et al., gives a probabilistic view on causation and is based on the theory of Bayesian networks. A Bayesian network consists of a directed acyclic graph (DAG) and a conditional probability distribution (CPD) for each variable. A causal model gives a causal interpretation to the edges of a faithful Bayesian network. The causal interpretation of the edges and the validity of faithfulness are often criticized [4, 2, 18]. The causal interpretation is defined as the ability to predict the effect of changes to the system (by the so-called interventions). It is based on the modularity assumption. Faithfulness demands that the model reflects all conditional independencies of the probabilistic distribution. Conditional independencies are qualitative properties and, as we will show, are regularities that allow compression of the description of the probability distribution. By these regularities, we establish a correspondence between causal model theory and the Kolmogorov minimal sufficient statistics. Maximal use of the independencies leads to the minimal factorization. We show that in absence of other regularities the model is faithful and results in a decomposition of the distribution into independent CPDs which supports the modularity assumption. The next section shows how meaningful information can be separated from random information by use of the Kolmogorov structure function. Section 3 defines causal models and section 4 the related work. Section 5 applies the minimality principle on distributions and in the following section the correspondence with causal model theory is shown. Finally, section 7 discusses the assumptions. 2 Meaningful Information The Kolmogorov Complexity of an object x is defined to be the length of the shortest computer program that prints the sequence and then halts [8]: K(x) = min l(p) (1) p:u(p)=x with U a universal computer. p gives the shortest description of x, but not all bits of p can be regarded as containing meaningful information. We consider meaningful information as regularities that allow compression of x [15]. We therefore seek a description in 2 parts, one containing the meaningful information, which we put in the model, and one the remaining random noise, which put in the datato-model code. A model can be related to a model set, containing all objects the model can represent. We look for a model set S that contains x and the objects that share x s regularities. The Kolmogorov structure function of x is defined as the log-size of the smallest set including x which

2 Figure 2. Factorization based on variable ordering (A, B, C, D, E) and reduction by three independencies. how Bayesian networks describe probability distributions. Secondly, faithful models are defined as describing all independencies of a distribution. Ultimately, a causal interpretation is given to the network. Figure 1. Kolmogorov structure function for n-bit string x, k is the Kolmogorov minimal sufficient statistic of x. can be described with no more than k bits [3]: K(k, x) = min p:l(p) k U(p)=S x S log S (2) with S the size of set S. A typical graph of the structure function is illustrated in Figure 1. We start with k = 0 and increase the allowed complexity k of the model set. When k = 0, the only set that can be described is the entire set {0, 1} n, so that the corresponding log set size is n. As we increase k, the model can take advantage of the regularities of x, in such way that each bit reduces the set s size more than halving it. The slope of the curve is smaller than -1. When all regularities are exploited, each additional bit of k reduces the set by half. We proceed along the line of slope -1 until k = K(x) and the smallest set that can be described is the singleton x n. The curve K(S) + log S is also shown on the graph. It represents the descriptive complexity of x by using the two-part code. From k = k it reaches its minimal and equals to K(x). For random strings the curve starts at log S = n for k=0 and drops with a slope of -1 until reaching the x-axis at k = n. Each k-bit reveals one of the bits of x and reduces the set by half. The Kolmogorov minimal sufficient statistic is defined as the program p that describes the smallest set S such that K(S ) + log S K(x) [5]. The two-stage description of x is as good as the best single-stage description of x. The descriptive complexity of S is then k. 3 Causal Models We elaborate the theory of causal models, or causally interpreted Bayesian networks, in three steps. First, we show 3.1 Representation of Distributions Causal models offer a probabilistic interpretation of causality. They are fundamentally Bayesian networks, which offer dense representations of joint distributions. A joint distribution is defined over a set of stochastic variables X 1... X n and defines a probability (P [0, 1]) for each possible state (x 1... x n ) X 1,dom X n,dom, where X i,dom stands for the domain of variable X i. The joint distribution can be factorized relative to a variable ordering (X 1,..., X n ) as follows: P (X 1,..., X n ) = n P (X i X 1,..., X i 1 ) (3) i Variable X j can be removed from the conditioning set of variable X i if it becomes conditionally independent from X i by conditioning on the rest of the set: P (X i X 1... X i 1 ) =P (X i X 1... X j 1, X j+1... X i 1 ). (4) Such conditional independencies reduce the complexity of the factors in the factorization. The conditioning sets of the factors can be described by a Directed Acyclic Graph (DAG), in which each node represents a variable and has incoming edges from all variables of the conditioning set of its factor. The joint distribution is then described by the DAG and the conditional probability distributions (CPDs) of the variables conditioned on its parents: P (X i parents(x i )). A Bayesian network is a factorization that is minimal, in the sense that no edge can be deleted without destroying the correctness of the factorization. Although a Bayesian network is minimal, it depends on the chosen variable ordering. Some orderings lead to the same networks, but others result in different topologies. Take 5 stochastic variables A, B, C, D and E. Fig. 2 shows the graph that was constructed by simplifying the factorization based on variable ordering (A, B, C, D, E) by the three given conditional independencies. However, the Bayesian network, describing the same distribution, but

3 Figure 3. Bayesian network based on variable ordering (A, B, C, E, D) and five independencies. based on ordering (A, B, C, E, D), depicted in Fig. 3 contains 2 edges less because of 5 useful independencies. We call the minimal factorization as the factorization that has the least number of variables in the conditioning sets. 3.2 Representation of Independencies Pearl, Verma, and others started to interpret the DAG of a Bayesian network as a representation of the conditional independencies of a joint distribution [10]. They constructed a graphical criterion, called d-separation, for retrieving independencies from the graph that follow from the Markov condition, which states that a node becomes independent of its non-descendants by conditioning on its parents. Take the graph of Fig. 3. The d-separation criterion tells us that variable B separates A from E, since B blocks the path A B E. On the other hand, the path A C D E is blocked by C D E, which is called a v-structure. This path gets unblocked given D. The Markov condition holds for a Bayesian network, so that every independency found with the d-separation criterion in its DAG appears in the distribution. A Bayesian network is called faithful to the distribution if it represents all conditional independencies of the distribution. 3.3 Representation of Causal Relations Where Bayesian networks are mainly concerned with offering a dense and manageable representation of joint distributions, causal models intend to graphically describe the structure of the underlying physical mechanisms governing a system under study. In a causal model the state of each variable, represented by a node in the graph, is generated by a stochastic process that is determined by the values of its parent variables in the graph. The stochastic variation of this assignment is assumed independent of the variations in all other assignments. Each assignment process remains invariant to possible changes in assignment processes that govern other variables in the system. This modularity assumption enables the prediction of the effect of interventions, which are defined as specific modifications of some factors in the product of the factorization (Eq. 3) [11]. A causal model corresponds to a joint distribution defined over the variables and this results in a close connection between causal and probabilistic dependence [14]. For a causal model, the Causal Markov Condition tells us how variables depend on each other: each variable is probabilistically independent of its non-effects conditional on its direct causes. The probabilistic aspect of the condition is similar to the Markov condition. Hence, a causal model can be regarded a Bayesian network in which all edges are interpreted as representing causal influences between the corresponding variables. This interpretation represents the second aspect of the Causal Markov Condition: every probabilistic dependence must have a causal explanation (the so-called Principle of the Common Cause) [18]. Furthermore, causal model theory is based on the Minimality Principle (minimality of the model) and the Faithfulness Property (model describes all independencies). Spirtes, Glymour and Scheines rely in their work on causal models on an axiomatization of these 3 conditions [13]. 4 Related Work Kolmogorov complexity and related methods, such as Minimum Message Length (MML) [17, 16] and Minimum Description Length (MDL) [12], are mostly used for selecting the best model from a given set of models. The choice of the model class, however, determines the regularities that are considered. During our discussion, we try not to stick to an a priori chosen set of regularities, but search for the relevant regularities. By Theorem of [11], Pearl describes for which distributions faithful graphs exist and can be learned: the absence of d-separation implies dependence in almost all distributions compatible with the graph G. The reason is that a precise tuning of the parameters is required to generate independency along an unblocked path in the diagram, and such tuning is unlikely to occur in practice. Pearl solves this problem by imposing a stability restriction on the distribution [11](sec. 2.4). The occurrence of any independency must remain invariant to any change in the distributional parametrization of the graph. This corresponds with regularities in the CPDs, as will be proved by theorem 4. A change of the CPDs would break the regularity. Pearl claims that there exists at least 1 distribution faithful with the model, while we show that all typical models of the DAG model set are faithful. The interventions viewpoint on causality describes only one aspect of causality, see [18] for an overview of different views. 5 Minimal Description of Distributions A joint distribution P (X 1... X n ) can be described shorter by a factorization that is reduced by conditional independencies. The minimal factorization leads to P (X 1... X n ) = CP D i. The descriptive size of the CPDs is determined by the number of variables in the conditioning sets. The total number of conditioning variables thus defines the shortest factorization. A two-part descrip-

4 tion is then: descr(p (X 1... X n )) ={parents(x 1 )... parents(x n )} + {CP D 1... CP D n } (5) Note that the parents lists can be described very compact. For example with a n bit string in which bit i is 1 if X i is present in the list. The following theorems show that the first part offers the minimal model if the CPDs are random and unrelated. Theorem 1 The parents lists, {parents(x 1 )... parents(x n )}, in the two-part code given by Eq. 5 contains meaningful information of a probability distribution. Every variable X j that can be eliminated from the conditioning set of X i due to a conditional independency as stated by Eq. 4, results in a reduction of the descriptive complexity by ( X i,dom 1). X 1,dom... X j 1,dom.( X j,dom 1). X j+1,dom... X i 1,dom.d (6) with X k,dom the size of the domain of X k and d the precision in bits to which each probability is described. The description of variable X j in the parents list takes no more than log n bits, which is almost always lower than the above complexity reduction (except when d is taken absurdly small). Every bit of the parents lists reduces the descriptive complexity by more than one bit and, hence, is meaningful information. 6 Equivalence with Causal Model Theory We hypothesize that the above decomposition is equivalent with the theory of causal models. The relation between both is proved by two theorems. 6.1 Relation between minimal factorizations and Bayesian networks Theorem 3 If a faithful Bayesian network exists for a distribution, it is the minimal factorization. Oliver and Smith define the conditions for sound transformations of Bayesian networks, where sound means that the transformation does not introduce extraneous independencies [9]. No edge removal is permitted, only reorientation and addition of edges. Additionally, if a reorientation destroys a v-structure or creates a new one, an edge should be added connecting the common parents in the former or in the newly created v-structure. Such transformations however eliminate some independencies represented by the original graph. Assume the existence of a Bayesian network based on a different variable ordering that has fewer edges than the faithful network. It must be possible to transform one into the other. The network has fewer edges, so edges must be added by the transformation, and this destroys independencies. But the network cannot represent more independencies, because the faithful network represents all independencies. The assumption leads to a contradiction. Theorem 2 If the two-part code description of a probability distribution, given by Eq. 5, results in an incompressible string, the first part is a Kolmogorov minimal sufficient statistic. If a more compact description of the distribution would exist, the two-part decomposition would contain redundant bits. Theorem 1 showed that the first part contains meaningful information. The second part does not, since it is incompressible. The first part, described minimally, is therefore the Kolmogorov minimal sufficient statistic. The distribution decomposes uniquely 1 and minimally into the CPDs, which are atomic and independent. The decomposition thus offers a canonical representation. The system under study is decomposed into independent subsystems that are only connected via the variables. In absence of further information, we may assume that each CPD represents a part of reality. This implies modularity, one subsystem can be replaced by another without affecting the rest of the system. 1 There can be multiple minimal factorizations, which are closely related though. We come back to this in the next section. Theorem 4 A Bayesian network with unrelated, random conditional probability distributions (CPDs) is faithful. Recall that a Bayesian network is a factorization that is edge-minimal. This means that for each parent pa i,j of variable X i holds that P (X i pa i,1,... pa i,j,... pa i,k ) P (X i pa i,1,... pa i,j 1, pa i,j+1,... pa i,k ) (7) The proof will show that any two variables that are d- connected are dependent, unless the probabilities of the CPDs are related. We consider the following possibilities. The two variables can be adjacent (a), related by a Markov chain (b) 2, a v-structure (c), a combination of both or connected by multiple paths (d). First we prove that a variable marginally depends on each of its adjacent variables (a). Consider nodes D and E of the Bayesian network of Fig. 3. For not overloading the proof, we will demonstrate that P (D E) P (D), but the proof can easily be generalized. The first term can be 2 Recall that a Markov chain is a path not containing v-structures.

5 written as: P (D E) = P (D E, c 1 ).P (c 1 ) + P (D E, c 2 ).P (c 2 ) +... (8) with c 1 and c 2 C dom. C is also a parent of D, thus, by Eq. 7, there are at least two values of C dom for which P (D E, c i ) P (D E) 3. Take c 1 and c 2 being such values, thus P (D E, c 1 ) P (D E, c 2 ). There are also at least 2 such values of E dom, take e 1 and e 2. Eq. 8 should hold for all values of E and equal to P (D) to get an independency. This results in the following relation among the probabilities: P (D e 1, c 1 ).P (c 1 ) + P (D e 1, c 2 ).P (c 2 ) = P (D e 2, c 1 ).P (c 1 ) + P (D e 2, c 2 ).P (c 2 ) (9) Note that the equation can not be reduced, the conditional probabilities are not equal to P (D) nor to each other. Next, by the same arguments it can be proved that variables connected by a Markov chain are by default dependent (b). Take A B E in Fig. 3, independence of A and E requires that P (E a) = b B P (E b).p (b a) = P (E) a A. (10) and this also results in a regularity among the CPDs. In a v-structure, both causes are dependent when conditioned on their common effect (c), for C D E, P (D C, E) P (D E) is true by Eq. 7. Finally, if there are multiple unblocked paths connecting two variables, then independence of both variables implies a regularity, too (d). Take A and D in Fig. 3: P (D A) = P (D c, e).p (c A).P (e b).p (b A). b B c C e E (11) Note that P (c, e A) = P (c A).P (e A) follows from the independence of C and E given A. All factors from the equation satisfy Eq. 7, so that the equation only equals to P (D) if there is a relation among the CPDs. Table 1 gives an example distribution of P (D E, C) for which D and E are independent, assuming that P (C = 0) = P (C = 1) = 0.5. The regularity of Eq. 9 applies for the distribution. From the theorem it follows that the Bayesian network is a minimal factorization. Bayesian networks not based on a minimal factorization, such as the one of Fig. 2, are always compressible, namely by the regularities among 3 P (D E) is a weighted average of P (D E, C). If one probability P (D E, c 1 ) is different than this average, let s say higher, than there must be at least one value lower than the average, thus different. E C P (D C, E) Table 1. Example of a CPD for which P (D E) = P (D), assuming that P (C = 0) = P (C = 1) = 0.5. the CPDs that follow from the independencies not represented by them. Multiple faithful models can exist for a distribution though. These models represent the same set of independencies and are therefore statistically indistinguishable. They define a Markov-equivalent class. It is proved that they share the same v-structures and only differ in the orientation of the edges [11]. The corresponding factorizations have the same number of conditioning variables, thus all have the same complexity. The observations cannot conclude on the correct model, but we have demarcated a set of closely related models which contains the correct model. 6.2 Equivalence The conditions for causal models, Minimality, Faithfulness and the Causal Markov Condition (section 3.3), are fulfilled for a minimal factorization with random CPDs. Minimality holds by definition, the faithfulness is proved by theorem 4 and the conditional independencies that follow from the Markov condition are present since it is a valid Bayesian network. Finally, the causal interpretation of the edges is correct as long as we define causality in terms of interventions. The modularity of the decomposition captures Pearl s interventions. An intervention, which Pearl considers an atomic operation, can be seen as replacing one specific CPD with a CPD that allows perfect control over the variable (for setting it to a certain state). We hypothesize that the consequences of causal models, like d-separation, the inference and identifiability algorithms, are conform with the CPD decomposition. They solely depends on the CPDs and the variables that link them. Take the flow of information through a causal model. In the model of Fig. 4, variables D and E contain information about A. This information is captured by C. The decrease of uncertainty of A depends on the information that D or E provide about C, but is independent whether the information comes from D or E. C screens A off from D and E, and also D from E. The interaction between the variables happens via C and is represented by the edges. The graphical representation of a causal model suggests that the edges constitute the atomic elements of the model. This can, however, not explain the interaction between A and B. C does not screen A off from B. Moreover, C should be known for having a dependency between A and B. This interaction pattern is captured by taking the CPDs as the atomic elements. We can say that

6 Figure 5. Causal model in which A is independent from D. Figure 4. Example Causal Model. the information travels between the CPDs through the variables. 7 Validity 7.1 Validity of Faithfulness Faithfulness of a causal model is the cornerstone of causal model theory and the accompanying learning algorithms. We showed that a causal model relies on a specific type of regularities, the conditional independencies that follow from the Markov condition. The simplest model should, however, exploit all regularities. There are regularities that a causal model does not capture. If such regularities appear, the minimal Bayesian Network can be either faithful or not. If the model remains faithful, the additional regularities do not interfere with the conditional independencies. They can thus be regarded as regularities of a lower level. A well-known example is when the description of individual CPDs can be further compressed. This regularity is called local structure [1] and appears inside a building block. If the minimal Bayesian Network is unfaithful, the regularities generate independencies not resulting from the Markov condition alone. This does not exclude that the distribution might be described minimally by a causal model augmented with a description of the additional regularities, ie. that the CPD decomposition is still valid. The mostknown example of unfaithfulness is when in the model of Fig. 5 A and D appear to be independent [13]. This happens when the influences along the paths A B D and A C D exactly balance, so that they cancel each other out and the net effect results in an independence. The independence of A and D is, however, not expected by the causal model. The distribution is not typical for the set of distributions that can be described by the model. d-seperation describes the independencies that can be expected from the typical distributions of the causal model set. Distributions with deterministic or functional relations can not be represented by a faithful graph too [13]. In [7] we show that this is related to the violation of the intersection condition, one of the conditions that Pearl imposes on a distribution in the elaboration of causal theory and its algorithms [10]. The solution we proposed in [7] is to incorporate the information about deterministic relations in an augmented causal model, and to extend the d-separation criterion so that it can be used to retrieve all conditional independencies from the model. In such way, the faithfulness of the model can be reestablished, and the model again incorporates all regularities from the data. These examples do not challenge the validity of the causal interpretation of the model. The next chapter focusses on other counterexamples. 7.2 Validity of the CPD Decomposition The CPD decomposition of a joint distribution implies that they represent independent mechanisms. In the model of Fig. 4, CP D D and CP D E are independent, the states of D and E only depend on C. This decomposition is, however, not valid for all systems. For some systems, the CPDs do not represent independent mechanisms. Take the example of particle decay, one of the counterexamples of the Causal Markov Condition reported by [18], p.55, taken from Fraassen (1980, p. 29): Suppose that a particle decays into 2 parts, that conservation of total momentum obtains, and that it is not determined by the prior state of the particle what the momentum of each part will be after the decay. By conservation, the momentum of one part will be determined by the momentum of the other part. By indeterminism, the prior state of the particle will not determine what the momenta of each part will be after the decay. Thus there is no prior screener off. The prior state S fails to screen off the momenta. But by symmetry, neither of the two parts momenta M 1 and M 2 can be the cause of the other. This system cannot be represented by a causal model. The generation of M 1 and M 2 by S should be considered as one (causal) mechanism, as shown in Fig. 6. Some of the other counterexamples of the Causal Markov Condition given in [18] are similar. Take the set of strings of n bits for which m consecutive bits are 1 and the others are 0. For n = 8 and m = 2, and represent valid strings. Every bit can be regarded as a discrete variable. By picking valid strings randomly, the joint distribution is observed. All bits

7 Finally, note that faithfulness can be interpreted in a broader sense as the ability of a model to explain all regularities of the data. References Figure 6. Particle with state S decays into 2 parts with momenta M 1 and M 2. Figure 7. Two models for a pattern in a 8-bit string. are correlated, but each pair becomes independent by conditioning on some other bits. The simplest model for this pattern contains a latent variable, denoting the start position of the non-zero bit sequence. The causal model, shown in Fig. 7(a), however, considers each edge as a separate mechanism. But the mechanisms are not unrelated, the decomposition is not valid. The model fails to represent the many conditional independencies. The model of Fig. 7(b) is more accurate, it indicates that there is one mechanism generating the states of all bits. 8 Conclusions The conditional independencies, on which causal model theory is based, can be regarded as the regularities that allow compression of distributions and the construction of minimal models. We showed that the meaningful information of a causal model lies in its DAG, which defines the decomposition of the distribution into independent submodels, the CPDs. If this decomposition exploits all regularities, causal model theory describes what we can expect from such a system. For example, which conditional independencies appear, or the effect of interventions. In absence of more information, the model offers a good hypothesis about reality. This assumption is supported by the fact that science relies on falsification rather than on confirmation (Popper). One can never prove that an hypothesis is invariably correct, one can only search for observations that refute the hypothesis. [1] C. Boutilier, N. Friedman, M. Goldszmidt, and D. Koller. Context-specific independence in bayesian networks. In Uncertainty in Artificial Intelligence, pages , [2] N. Cartwright. What is wrong with bayes nets? The Monist, pages , [3] T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons, Inc., [4] D. Freedman and P. Humphreys. Are there algorithms that discover causal structure? Synthese, 121:2954, [5] P. Gács, J. Tromp, and P. M. B. Vitányi. Algorithmic statistics. IEEE Trans. Inform. Theory, 47(6): , [6] K. B. Korb and E. Nyberg. The power of intervention. Minds and Machines, 16(3): , [7] J. Lemeire, S. Maes, S. Meganck, and E. Dirkx. The representation and learning of equivalent information in causal models. Technical Report IRIS-TR-0099, Vrije Universiteit Brussel, [8] M. Li and P. M. B. Vitányi. An Introduction to Kolmogorov Complexity and Its Applications. Springer Verlag, [9] R. M. Oliver and J. Q. Smith. Influence Diagrams, Belief Nets and Decision Analysis. Wiley, [10] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA, Morgan Kaufman Publishers, [11] J. Pearl. Causality. Models, Reasoning, and Inference. Cambridge University Press, [12] J. Rissanen. Modeling by shortest data description. Automatica, 14: , [13] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. Springer Verlag, 2nd edition, [14] W. Spohn. Bayesian nets are all there is to causal dependence. In In Stochastic Causality, Maria Carla Galaviotti, Eds. CSLI Lecture Notes, [15] P. M. B. Vitányi. Meaningful information. In P. Bose and P. Morin, editors, ISAAC, volume 2518 of Lecture Notes in Computer Science, pages Springer, [16] C. S. Wallace. Statistical and Inductive Inference by Minimum Message Length. Springer, [17] C. S. Wallace and D. L. Dowe. An information measure for classification. Computer Journal, 11(2): , [18] J. Williamson. Bayesian Nets And Causality: Philosophical And Computational Foundations. Oxford University Press, 2005.

Inferring the Causal Decomposition under the Presence of Deterministic Relations.

Inferring the Causal Decomposition under the Presence of Deterministic Relations. Inferring the Causal Decomposition under the Presence of Deterministic Relations. Jan Lemeire 1,2, Stijn Meganck 1,2, Francesco Cartella 1, Tingting Liu 1 and Alexander Statnikov 3 1-ETRO Department, Vrije

More information

Augmented Bayesian Networks for Representing Information Equivalences and Extensions to the PC Learning Algorithm.

Augmented Bayesian Networks for Representing Information Equivalences and Extensions to the PC Learning Algorithm. Augmented Bayesian Networks for Representing Information Equivalences and Extensions to the PC Learning Algorithm. Jan Lemeire, Sam Maes, Erik Dirkx, Kris Steenhaut, Ann Nowé August 8, 2007 Abstract Data

More information

Causal Inference on Data Containing Deterministic Relations.

Causal Inference on Data Containing Deterministic Relations. Causal Inference on Data Containing Deterministic Relations. Jan Lemeire Kris Steenhaut Sam Maes COMO lab, ETRO Department Vrije Universiteit Brussel, Belgium {jan.lemeire, kris.steenhaut}@vub.ac.be, sammaes@gmail.com

More information

Towards an extension of the PC algorithm to local context-specific independencies detection

Towards an extension of the PC algorithm to local context-specific independencies detection Towards an extension of the PC algorithm to local context-specific independencies detection Feb-09-2016 Outline Background: Bayesian Networks The PC algorithm Context-specific independence: from DAGs to

More information

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables N-1 Experiments Suffice to Determine the Causal Relations Among N Variables Frederick Eberhardt Clark Glymour 1 Richard Scheines Carnegie Mellon University Abstract By combining experimental interventions

More information

Detecting marginal and conditional independencies between events and learning their causal structure.

Detecting marginal and conditional independencies between events and learning their causal structure. Detecting marginal and conditional independencies between events and learning their causal structure. Jan Lemeire 1,4, Stijn Meganck 1,3, Albrecht Zimmermann 2, and Thomas Dhollander 3 1 ETRO Department,

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr

More information

Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach

Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach Stijn Meganck 1, Philippe Leray 2, and Bernard Manderick 1 1 Vrije Universiteit Brussel, Pleinlaan 2,

More information

Learning Semi-Markovian Causal Models using Experiments

Learning Semi-Markovian Causal Models using Experiments Learning Semi-Markovian Causal Models using Experiments Stijn Meganck 1, Sam Maes 2, Philippe Leray 2 and Bernard Manderick 1 1 CoMo Vrije Universiteit Brussel Brussels, Belgium 2 LITIS INSA Rouen St.

More information

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features Eun Yong Kang Department

More information

Causality in Econometrics (3)

Causality in Econometrics (3) Graphical Causal Models References Causality in Econometrics (3) Alessio Moneta Max Planck Institute of Economics Jena moneta@econ.mpg.de 26 April 2011 GSBC Lecture Friedrich-Schiller-Universität Jena

More information

CISC 876: Kolmogorov Complexity

CISC 876: Kolmogorov Complexity March 27, 2007 Outline 1 Introduction 2 Definition Incompressibility and Randomness 3 Prefix Complexity Resource-Bounded K-Complexity 4 Incompressibility Method Gödel s Incompleteness Theorem 5 Outline

More information

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution NETWORK ANALYSIS Lourens Waldorp PROBABILITY AND GRAPHS The objective is to obtain a correspondence between the intuitive pictures (graphs) of variables of interest and the probability distributions of

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Learning Multivariate Regression Chain Graphs under Faithfulness

Learning Multivariate Regression Chain Graphs under Faithfulness Sixth European Workshop on Probabilistic Graphical Models, Granada, Spain, 2012 Learning Multivariate Regression Chain Graphs under Faithfulness Dag Sonntag ADIT, IDA, Linköping University, Sweden dag.sonntag@liu.se

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

Arrowhead completeness from minimal conditional independencies

Arrowhead completeness from minimal conditional independencies Arrowhead completeness from minimal conditional independencies Tom Claassen, Tom Heskes Radboud University Nijmegen The Netherlands {tomc,tomh}@cs.ru.nl Abstract We present two inference rules, based on

More information

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 13 24 march 2017 Reasoning with Bayesian Networks Naïve Bayesian Systems...2 Example

More information

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 2: Directed Graphical Models Theo Rekatsinas 1 Questions Questions? Waiting list Questions on other logistics 2 Section 1 1. Intro to Bayes Nets 3 Section

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this

More information

COMP538: Introduction to Bayesian Networks

COMP538: Introduction to Bayesian Networks COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

On the Identification of a Class of Linear Models

On the Identification of a Class of Linear Models On the Identification of a Class of Linear Models Jin Tian Department of Computer Science Iowa State University Ames, IA 50011 jtian@cs.iastate.edu Abstract This paper deals with the problem of identifying

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models

A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models Y. Xiang University of Guelph, Canada Abstract Discovery of graphical models is NP-hard in general, which justifies using

More information

Equivalence in Non-Recursive Structural Equation Models

Equivalence in Non-Recursive Structural Equation Models Equivalence in Non-Recursive Structural Equation Models Thomas Richardson 1 Philosophy Department, Carnegie-Mellon University Pittsburgh, P 15213, US thomas.richardson@andrew.cmu.edu Introduction In the

More information

Directed Graphical Models or Bayesian Networks

Directed Graphical Models or Bayesian Networks Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact

More information

Local Characterizations of Causal Bayesian Networks

Local Characterizations of Causal Bayesian Networks In M. Croitoru, S. Rudolph, N. Wilson, J. Howse, and O. Corby (Eds.), GKR 2011, LNAI 7205, Berlin Heidelberg: Springer-Verlag, pp. 1-17, 2012. TECHNICAL REPORT R-384 May 2011 Local Characterizations of

More information

Learning causal network structure from multiple (in)dependence models

Learning causal network structure from multiple (in)dependence models Learning causal network structure from multiple (in)dependence models Tom Claassen Radboud University, Nijmegen tomc@cs.ru.nl Abstract Tom Heskes Radboud University, Nijmegen tomh@cs.ru.nl We tackle the

More information

Directed Graphical Models

Directed Graphical Models CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential

More information

The Role of Assumptions in Causal Discovery

The Role of Assumptions in Causal Discovery The Role of Assumptions in Causal Discovery Marek J. Druzdzel Decision Systems Laboratory, School of Information Sciences and Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15260,

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University

More information

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges 1 PRELIMINARIES Two vertices X i and X j are adjacent if there is an edge between them. A path

More information

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes Uncertainty & Bayesian Networks

More information

Belief Update in CLG Bayesian Networks With Lazy Propagation

Belief Update in CLG Bayesian Networks With Lazy Propagation Belief Update in CLG Bayesian Networks With Lazy Propagation Anders L Madsen HUGIN Expert A/S Gasværksvej 5 9000 Aalborg, Denmark Anders.L.Madsen@hugin.com Abstract In recent years Bayesian networks (BNs)

More information

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs Proceedings of Machine Learning Research vol 73:21-32, 2017 AMBN 2017 Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs Jose M. Peña Linköping University Linköping (Sweden) jose.m.pena@liu.se

More information

Applying Bayesian networks in the game of Minesweeper

Applying Bayesian networks in the game of Minesweeper Applying Bayesian networks in the game of Minesweeper Marta Vomlelová Faculty of Mathematics and Physics Charles University in Prague http://kti.mff.cuni.cz/~marta/ Jiří Vomlel Institute of Information

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Algorithmic Probability

Algorithmic Probability Algorithmic Probability From Scholarpedia From Scholarpedia, the free peer-reviewed encyclopedia p.19046 Curator: Marcus Hutter, Australian National University Curator: Shane Legg, Dalle Molle Institute

More information

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Bayesian Networks Representing Joint Probability Distributions 2 n -1 free parameters Reducing Number of Parameters: Conditional Independence

More information

What Causality Is (stats for mathematicians)

What Causality Is (stats for mathematicians) What Causality Is (stats for mathematicians) Andrew Critch UC Berkeley August 31, 2011 Introduction Foreword: The value of examples With any hard question, it helps to start with simple, concrete versions

More information

Introduction to Artificial Intelligence. Unit # 11

Introduction to Artificial Intelligence. Unit # 11 Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian

More information

Monotone DAG Faithfulness: A Bad Assumption

Monotone DAG Faithfulness: A Bad Assumption Monotone DAG Faithfulness: A Bad Assumption David Maxwell Chickering Christopher Meek Technical Report MSR-TR-2003-16 Microsoft Research Microsoft Corporation One Microsoft Way Redmond, WA 98052 Abstract

More information

Causal Inference & Reasoning with Causal Bayesian Networks

Causal Inference & Reasoning with Causal Bayesian Networks Causal Inference & Reasoning with Causal Bayesian Networks Neyman-Rubin Framework Potential Outcome Framework: for each unit k and each treatment i, there is a potential outcome on an attribute U, U ik,

More information

Ockham Efficiency Theorem for Randomized Scientific Methods

Ockham Efficiency Theorem for Randomized Scientific Methods Ockham Efficiency Theorem for Randomized Scientific Methods Conor Mayo-Wilson and Kevin T. Kelly Department of Philosophy Carnegie Mellon University Formal Epistemology Workshop (FEW) June 19th, 2009 1

More information

Estimation of linear non-gaussian acyclic models for latent factors

Estimation of linear non-gaussian acyclic models for latent factors Estimation of linear non-gaussian acyclic models for latent factors Shohei Shimizu a Patrik O. Hoyer b Aapo Hyvärinen b,c a The Institute of Scientific and Industrial Research, Osaka University Mihogaoka

More information

Optimism in the Face of Uncertainty Should be Refutable

Optimism in the Face of Uncertainty Should be Refutable Optimism in the Face of Uncertainty Should be Refutable Ronald ORTNER Montanuniversität Leoben Department Mathematik und Informationstechnolgie Franz-Josef-Strasse 18, 8700 Leoben, Austria, Phone number:

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Counterfactual Reasoning in Algorithmic Fairness

Counterfactual Reasoning in Algorithmic Fairness Counterfactual Reasoning in Algorithmic Fairness Ricardo Silva University College London and The Alan Turing Institute Joint work with Matt Kusner (Warwick/Turing), Chris Russell (Sussex/Turing), and Joshua

More information

Challenges (& Some Solutions) and Making Connections

Challenges (& Some Solutions) and Making Connections Challenges (& Some Solutions) and Making Connections Real-life Search All search algorithm theorems have form: If the world behaves like, then (probability 1) the algorithm will recover the true structure

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Learning Marginal AMP Chain Graphs under Faithfulness

Learning Marginal AMP Chain Graphs under Faithfulness Learning Marginal AMP Chain Graphs under Faithfulness Jose M. Peña ADIT, IDA, Linköping University, SE-58183 Linköping, Sweden jose.m.pena@liu.se Abstract. Marginal AMP chain graphs are a recently introduced

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim Course on Bayesian Networks, summer term 2010 0/42 Bayesian Networks Bayesian Networks 11. Structure Learning / Constrained-based Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL)

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University of Helsinki 14 Helsinki, Finland kun.zhang@cs.helsinki.fi Aapo Hyvärinen

More information

Representing Independence Models with Elementary Triplets

Representing Independence Models with Elementary Triplets Representing Independence Models with Elementary Triplets Jose M. Peña ADIT, IDA, Linköping University, Sweden jose.m.pena@liu.se Abstract An elementary triplet in an independence model represents a conditional

More information

Measurement Error and Causal Discovery

Measurement Error and Causal Discovery Measurement Error and Causal Discovery Richard Scheines & Joseph Ramsey Department of Philosophy Carnegie Mellon University Pittsburgh, PA 15217, USA 1 Introduction Algorithms for causal discovery emerged

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence General overview Introduction Directed acyclic graphs (DAGs) and conditional independence DAGs and causal effects

More information

arxiv: v1 [stat.ml] 15 Nov 2016

arxiv: v1 [stat.ml] 15 Nov 2016 Recoverability of Joint Distribution from Missing Data arxiv:1611.04709v1 [stat.ml] 15 Nov 2016 Jin Tian Department of Computer Science Iowa State University Ames, IA 50014 jtian@iastate.edu Abstract A

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Bayes Nets: Independence Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]

More information

Faithfulness of Probability Distributions and Graphs

Faithfulness of Probability Distributions and Graphs Journal of Machine Learning Research 18 (2017) 1-29 Submitted 5/17; Revised 11/17; Published 12/17 Faithfulness of Probability Distributions and Graphs Kayvan Sadeghi Statistical Laboratory University

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Graphical Models - Part I

Graphical Models - Part I Graphical Models - Part I Oliver Schulte - CMPT 726 Bishop PRML Ch. 8, some slides from Russell and Norvig AIMA2e Outline Probabilistic Models Bayesian Networks Markov Random Fields Inference Outline Probabilistic

More information

Structure learning in human causal induction

Structure learning in human causal induction Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use

More information

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction 15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Intervention, determinism, and the causal minimality condition

Intervention, determinism, and the causal minimality condition DOI 10.1007/s11229-010-9751-1 Intervention, determinism, and the causal minimality condition Jiji Zhang Peter Spirtes Received: 12 May 2009 / Accepted: 30 October 2009 Springer Science+Business Media B.V.

More information

Bayesian networks. Chapter 14, Sections 1 4

Bayesian networks. Chapter 14, Sections 1 4 Bayesian networks Chapter 14, Sections 1 4 Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1 4 1 Bayesian networks

More information

Interventions and Causal Inference

Interventions and Causal Inference Interventions and Causal Inference Frederick Eberhardt 1 and Richard Scheines Department of Philosophy Carnegie Mellon University Abstract The literature on causal discovery has focused on interventions

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Lecture 5: Bayesian Network

Lecture 5: Bayesian Network Lecture 5: Bayesian Network Topics of this lecture What is a Bayesian network? A simple example Formal definition of BN A slightly difficult example Learning of BN An example of learning Important topics

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Introduction to Causal Calculus

Introduction to Causal Calculus Introduction to Causal Calculus Sanna Tyrväinen University of British Columbia August 1, 2017 1 / 1 2 / 1 Bayesian network Bayesian networks are Directed Acyclic Graphs (DAGs) whose nodes represent random

More information

Integrating Correlated Bayesian Networks Using Maximum Entropy

Integrating Correlated Bayesian Networks Using Maximum Entropy Applied Mathematical Sciences, Vol. 5, 2011, no. 48, 2361-2371 Integrating Correlated Bayesian Networks Using Maximum Entropy Kenneth D. Jarman Pacific Northwest National Laboratory PO Box 999, MSIN K7-90

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

4.1 Notation and probability review

4.1 Notation and probability review Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

CS Lecture 3. More Bayesian Networks

CS Lecture 3. More Bayesian Networks CS 6347 Lecture 3 More Bayesian Networks Recap Last time: Complexity challenges Representing distributions Computing probabilities/doing inference Introduction to Bayesian networks Today: D-separation,

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Lecture Notes in Machine Learning Chapter 4: Version space learning

Lecture Notes in Machine Learning Chapter 4: Version space learning Lecture Notes in Machine Learning Chapter 4: Version space learning Zdravko Markov February 17, 2004 Let us consider an example. We shall use an attribute-value language for both the examples and the hypotheses

More information

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues Bayesian Networks in Epistemology and Philosophy of Science Lecture 1: Bayesian Networks Center for Logic and Philosophy of Science Tilburg University, The Netherlands Formal Epistemology Course Northern

More information

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network. ecall from last time Lecture 3: onditional independence and graph structure onditional independencies implied by a belief network Independence maps (I-maps) Factorization theorem The Bayes ball algorithm

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone Preliminaries Graphoid Axioms d-separation Wrap-up Much of this material is adapted from Chapter 4 of Darwiche s book January 23, 2014 Preliminaries Graphoid Axioms d-separation Wrap-up 1 Preliminaries

More information

Bayesian Networks. Motivation

Bayesian Networks. Motivation Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

Identifying Linear Causal Effects

Identifying Linear Causal Effects Identifying Linear Causal Effects Jin Tian Department of Computer Science Iowa State University Ames, IA 50011 jtian@cs.iastate.edu Abstract This paper concerns the assessment of linear cause-effect relationships

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows

More information

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science

More information

Recovering Probability Distributions from Missing Data

Recovering Probability Distributions from Missing Data Proceedings of Machine Learning Research 77:574 589, 2017 ACML 2017 Recovering Probability Distributions from Missing Data Jin Tian Iowa State University jtian@iastate.edu Editors: Yung-Kyun Noh and Min-Ling

More information

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition Bayesian Networks Two components: 1. Directed Acyclic Graph (DAG) G: There is a node for every variable D: Some nodes

More information