CAUSAL MODELS: THE MEANINGFUL INFORMATION OF PROBABILITY DISTRIBUTIONS
|
|
- Olivia Anthony
- 6 years ago
- Views:
Transcription
1 CAUSAL MODELS: THE MEANINGFUL INFORMATION OF PROBABILITY DISTRIBUTIONS Jan Lemeire, Erik Dirkx ETRO Dept., Vrije Universiteit Brussel Pleinlaan 2, 1050 Brussels, Belgium ABSTRACT This paper claims that causal model theory describes the meaningful information of probability distributions after a factorization. If the minimal factorization of a distribution is incompressible, its Kolmogorov minimal sufficient statistics, the parents lists, can be represented by a directed acyclic graph (DAG). We showed that a faithful Bayesian network is a minimal factorization and that a Bayesian network with random and unrelated conditional probability distributions (CPDs) is faithful and thus a minimal factorization. The validity of faithfulness depends on the presence of other regularities. The Bayesian network is a canonical representation, it uniquely decomposes the distribution into independent submodels, the CPDs. In absence of further information, we may assume modularity and that the model offers a good hypothesis about the underlying mechanisms of the system. KEY WORDS Causal Models, Kolmogorov Complexity, Meaningful Information, Reductionism. 1 Introduction Kolmogorov Complexity gives an objective measure of the complexity of an object, which allows the formal application of Occam s Razor to modeling. The central idea is that modeling can be equated with finding regularities in data. An objective property of a regularity is identified by its ability to compress the data, i.e. to describe the data using fewer symbols than the number of symbols needed to describe the data literally. The more regularities there are, the more the data can be compressed. The regularities of the data constitute the meaningful information. A good model only captures the regularities, not the accidental, random information of the data. The simplest model that does so is called the Kolmogorov minimal sufficient statistics for the data. The theory of causal models, as developed by Pearl, Verma, Spirtes, Glymour, Scheines et al., gives a probabilistic view on causation and is based on the theory of Bayesian networks. A Bayesian network consists of a directed acyclic graph (DAG) and a conditional probability distribution (CPD) for each variable. A causal model gives a causal interpretation to the edges of a faithful Bayesian network. The causal interpretation of the edges and the validity of faithfulness are often criticized [4, 2, 18]. The causal interpretation is defined as the ability to predict the effect of changes to the system (by the so-called interventions). It is based on the modularity assumption. Faithfulness demands that the model reflects all conditional independencies of the probabilistic distribution. Conditional independencies are qualitative properties and, as we will show, are regularities that allow compression of the description of the probability distribution. By these regularities, we establish a correspondence between causal model theory and the Kolmogorov minimal sufficient statistics. Maximal use of the independencies leads to the minimal factorization. We show that in absence of other regularities the model is faithful and results in a decomposition of the distribution into independent CPDs which supports the modularity assumption. The next section shows how meaningful information can be separated from random information by use of the Kolmogorov structure function. Section 3 defines causal models and section 4 the related work. Section 5 applies the minimality principle on distributions and in the following section the correspondence with causal model theory is shown. Finally, section 7 discusses the assumptions. 2 Meaningful Information The Kolmogorov Complexity of an object x is defined to be the length of the shortest computer program that prints the sequence and then halts [8]: K(x) = min l(p) (1) p:u(p)=x with U a universal computer. p gives the shortest description of x, but not all bits of p can be regarded as containing meaningful information. We consider meaningful information as regularities that allow compression of x [15]. We therefore seek a description in 2 parts, one containing the meaningful information, which we put in the model, and one the remaining random noise, which put in the datato-model code. A model can be related to a model set, containing all objects the model can represent. We look for a model set S that contains x and the objects that share x s regularities. The Kolmogorov structure function of x is defined as the log-size of the smallest set including x which
2 Figure 2. Factorization based on variable ordering (A, B, C, D, E) and reduction by three independencies. how Bayesian networks describe probability distributions. Secondly, faithful models are defined as describing all independencies of a distribution. Ultimately, a causal interpretation is given to the network. Figure 1. Kolmogorov structure function for n-bit string x, k is the Kolmogorov minimal sufficient statistic of x. can be described with no more than k bits [3]: K(k, x) = min p:l(p) k U(p)=S x S log S (2) with S the size of set S. A typical graph of the structure function is illustrated in Figure 1. We start with k = 0 and increase the allowed complexity k of the model set. When k = 0, the only set that can be described is the entire set {0, 1} n, so that the corresponding log set size is n. As we increase k, the model can take advantage of the regularities of x, in such way that each bit reduces the set s size more than halving it. The slope of the curve is smaller than -1. When all regularities are exploited, each additional bit of k reduces the set by half. We proceed along the line of slope -1 until k = K(x) and the smallest set that can be described is the singleton x n. The curve K(S) + log S is also shown on the graph. It represents the descriptive complexity of x by using the two-part code. From k = k it reaches its minimal and equals to K(x). For random strings the curve starts at log S = n for k=0 and drops with a slope of -1 until reaching the x-axis at k = n. Each k-bit reveals one of the bits of x and reduces the set by half. The Kolmogorov minimal sufficient statistic is defined as the program p that describes the smallest set S such that K(S ) + log S K(x) [5]. The two-stage description of x is as good as the best single-stage description of x. The descriptive complexity of S is then k. 3 Causal Models We elaborate the theory of causal models, or causally interpreted Bayesian networks, in three steps. First, we show 3.1 Representation of Distributions Causal models offer a probabilistic interpretation of causality. They are fundamentally Bayesian networks, which offer dense representations of joint distributions. A joint distribution is defined over a set of stochastic variables X 1... X n and defines a probability (P [0, 1]) for each possible state (x 1... x n ) X 1,dom X n,dom, where X i,dom stands for the domain of variable X i. The joint distribution can be factorized relative to a variable ordering (X 1,..., X n ) as follows: P (X 1,..., X n ) = n P (X i X 1,..., X i 1 ) (3) i Variable X j can be removed from the conditioning set of variable X i if it becomes conditionally independent from X i by conditioning on the rest of the set: P (X i X 1... X i 1 ) =P (X i X 1... X j 1, X j+1... X i 1 ). (4) Such conditional independencies reduce the complexity of the factors in the factorization. The conditioning sets of the factors can be described by a Directed Acyclic Graph (DAG), in which each node represents a variable and has incoming edges from all variables of the conditioning set of its factor. The joint distribution is then described by the DAG and the conditional probability distributions (CPDs) of the variables conditioned on its parents: P (X i parents(x i )). A Bayesian network is a factorization that is minimal, in the sense that no edge can be deleted without destroying the correctness of the factorization. Although a Bayesian network is minimal, it depends on the chosen variable ordering. Some orderings lead to the same networks, but others result in different topologies. Take 5 stochastic variables A, B, C, D and E. Fig. 2 shows the graph that was constructed by simplifying the factorization based on variable ordering (A, B, C, D, E) by the three given conditional independencies. However, the Bayesian network, describing the same distribution, but
3 Figure 3. Bayesian network based on variable ordering (A, B, C, E, D) and five independencies. based on ordering (A, B, C, E, D), depicted in Fig. 3 contains 2 edges less because of 5 useful independencies. We call the minimal factorization as the factorization that has the least number of variables in the conditioning sets. 3.2 Representation of Independencies Pearl, Verma, and others started to interpret the DAG of a Bayesian network as a representation of the conditional independencies of a joint distribution [10]. They constructed a graphical criterion, called d-separation, for retrieving independencies from the graph that follow from the Markov condition, which states that a node becomes independent of its non-descendants by conditioning on its parents. Take the graph of Fig. 3. The d-separation criterion tells us that variable B separates A from E, since B blocks the path A B E. On the other hand, the path A C D E is blocked by C D E, which is called a v-structure. This path gets unblocked given D. The Markov condition holds for a Bayesian network, so that every independency found with the d-separation criterion in its DAG appears in the distribution. A Bayesian network is called faithful to the distribution if it represents all conditional independencies of the distribution. 3.3 Representation of Causal Relations Where Bayesian networks are mainly concerned with offering a dense and manageable representation of joint distributions, causal models intend to graphically describe the structure of the underlying physical mechanisms governing a system under study. In a causal model the state of each variable, represented by a node in the graph, is generated by a stochastic process that is determined by the values of its parent variables in the graph. The stochastic variation of this assignment is assumed independent of the variations in all other assignments. Each assignment process remains invariant to possible changes in assignment processes that govern other variables in the system. This modularity assumption enables the prediction of the effect of interventions, which are defined as specific modifications of some factors in the product of the factorization (Eq. 3) [11]. A causal model corresponds to a joint distribution defined over the variables and this results in a close connection between causal and probabilistic dependence [14]. For a causal model, the Causal Markov Condition tells us how variables depend on each other: each variable is probabilistically independent of its non-effects conditional on its direct causes. The probabilistic aspect of the condition is similar to the Markov condition. Hence, a causal model can be regarded a Bayesian network in which all edges are interpreted as representing causal influences between the corresponding variables. This interpretation represents the second aspect of the Causal Markov Condition: every probabilistic dependence must have a causal explanation (the so-called Principle of the Common Cause) [18]. Furthermore, causal model theory is based on the Minimality Principle (minimality of the model) and the Faithfulness Property (model describes all independencies). Spirtes, Glymour and Scheines rely in their work on causal models on an axiomatization of these 3 conditions [13]. 4 Related Work Kolmogorov complexity and related methods, such as Minimum Message Length (MML) [17, 16] and Minimum Description Length (MDL) [12], are mostly used for selecting the best model from a given set of models. The choice of the model class, however, determines the regularities that are considered. During our discussion, we try not to stick to an a priori chosen set of regularities, but search for the relevant regularities. By Theorem of [11], Pearl describes for which distributions faithful graphs exist and can be learned: the absence of d-separation implies dependence in almost all distributions compatible with the graph G. The reason is that a precise tuning of the parameters is required to generate independency along an unblocked path in the diagram, and such tuning is unlikely to occur in practice. Pearl solves this problem by imposing a stability restriction on the distribution [11](sec. 2.4). The occurrence of any independency must remain invariant to any change in the distributional parametrization of the graph. This corresponds with regularities in the CPDs, as will be proved by theorem 4. A change of the CPDs would break the regularity. Pearl claims that there exists at least 1 distribution faithful with the model, while we show that all typical models of the DAG model set are faithful. The interventions viewpoint on causality describes only one aspect of causality, see [18] for an overview of different views. 5 Minimal Description of Distributions A joint distribution P (X 1... X n ) can be described shorter by a factorization that is reduced by conditional independencies. The minimal factorization leads to P (X 1... X n ) = CP D i. The descriptive size of the CPDs is determined by the number of variables in the conditioning sets. The total number of conditioning variables thus defines the shortest factorization. A two-part descrip-
4 tion is then: descr(p (X 1... X n )) ={parents(x 1 )... parents(x n )} + {CP D 1... CP D n } (5) Note that the parents lists can be described very compact. For example with a n bit string in which bit i is 1 if X i is present in the list. The following theorems show that the first part offers the minimal model if the CPDs are random and unrelated. Theorem 1 The parents lists, {parents(x 1 )... parents(x n )}, in the two-part code given by Eq. 5 contains meaningful information of a probability distribution. Every variable X j that can be eliminated from the conditioning set of X i due to a conditional independency as stated by Eq. 4, results in a reduction of the descriptive complexity by ( X i,dom 1). X 1,dom... X j 1,dom.( X j,dom 1). X j+1,dom... X i 1,dom.d (6) with X k,dom the size of the domain of X k and d the precision in bits to which each probability is described. The description of variable X j in the parents list takes no more than log n bits, which is almost always lower than the above complexity reduction (except when d is taken absurdly small). Every bit of the parents lists reduces the descriptive complexity by more than one bit and, hence, is meaningful information. 6 Equivalence with Causal Model Theory We hypothesize that the above decomposition is equivalent with the theory of causal models. The relation between both is proved by two theorems. 6.1 Relation between minimal factorizations and Bayesian networks Theorem 3 If a faithful Bayesian network exists for a distribution, it is the minimal factorization. Oliver and Smith define the conditions for sound transformations of Bayesian networks, where sound means that the transformation does not introduce extraneous independencies [9]. No edge removal is permitted, only reorientation and addition of edges. Additionally, if a reorientation destroys a v-structure or creates a new one, an edge should be added connecting the common parents in the former or in the newly created v-structure. Such transformations however eliminate some independencies represented by the original graph. Assume the existence of a Bayesian network based on a different variable ordering that has fewer edges than the faithful network. It must be possible to transform one into the other. The network has fewer edges, so edges must be added by the transformation, and this destroys independencies. But the network cannot represent more independencies, because the faithful network represents all independencies. The assumption leads to a contradiction. Theorem 2 If the two-part code description of a probability distribution, given by Eq. 5, results in an incompressible string, the first part is a Kolmogorov minimal sufficient statistic. If a more compact description of the distribution would exist, the two-part decomposition would contain redundant bits. Theorem 1 showed that the first part contains meaningful information. The second part does not, since it is incompressible. The first part, described minimally, is therefore the Kolmogorov minimal sufficient statistic. The distribution decomposes uniquely 1 and minimally into the CPDs, which are atomic and independent. The decomposition thus offers a canonical representation. The system under study is decomposed into independent subsystems that are only connected via the variables. In absence of further information, we may assume that each CPD represents a part of reality. This implies modularity, one subsystem can be replaced by another without affecting the rest of the system. 1 There can be multiple minimal factorizations, which are closely related though. We come back to this in the next section. Theorem 4 A Bayesian network with unrelated, random conditional probability distributions (CPDs) is faithful. Recall that a Bayesian network is a factorization that is edge-minimal. This means that for each parent pa i,j of variable X i holds that P (X i pa i,1,... pa i,j,... pa i,k ) P (X i pa i,1,... pa i,j 1, pa i,j+1,... pa i,k ) (7) The proof will show that any two variables that are d- connected are dependent, unless the probabilities of the CPDs are related. We consider the following possibilities. The two variables can be adjacent (a), related by a Markov chain (b) 2, a v-structure (c), a combination of both or connected by multiple paths (d). First we prove that a variable marginally depends on each of its adjacent variables (a). Consider nodes D and E of the Bayesian network of Fig. 3. For not overloading the proof, we will demonstrate that P (D E) P (D), but the proof can easily be generalized. The first term can be 2 Recall that a Markov chain is a path not containing v-structures.
5 written as: P (D E) = P (D E, c 1 ).P (c 1 ) + P (D E, c 2 ).P (c 2 ) +... (8) with c 1 and c 2 C dom. C is also a parent of D, thus, by Eq. 7, there are at least two values of C dom for which P (D E, c i ) P (D E) 3. Take c 1 and c 2 being such values, thus P (D E, c 1 ) P (D E, c 2 ). There are also at least 2 such values of E dom, take e 1 and e 2. Eq. 8 should hold for all values of E and equal to P (D) to get an independency. This results in the following relation among the probabilities: P (D e 1, c 1 ).P (c 1 ) + P (D e 1, c 2 ).P (c 2 ) = P (D e 2, c 1 ).P (c 1 ) + P (D e 2, c 2 ).P (c 2 ) (9) Note that the equation can not be reduced, the conditional probabilities are not equal to P (D) nor to each other. Next, by the same arguments it can be proved that variables connected by a Markov chain are by default dependent (b). Take A B E in Fig. 3, independence of A and E requires that P (E a) = b B P (E b).p (b a) = P (E) a A. (10) and this also results in a regularity among the CPDs. In a v-structure, both causes are dependent when conditioned on their common effect (c), for C D E, P (D C, E) P (D E) is true by Eq. 7. Finally, if there are multiple unblocked paths connecting two variables, then independence of both variables implies a regularity, too (d). Take A and D in Fig. 3: P (D A) = P (D c, e).p (c A).P (e b).p (b A). b B c C e E (11) Note that P (c, e A) = P (c A).P (e A) follows from the independence of C and E given A. All factors from the equation satisfy Eq. 7, so that the equation only equals to P (D) if there is a relation among the CPDs. Table 1 gives an example distribution of P (D E, C) for which D and E are independent, assuming that P (C = 0) = P (C = 1) = 0.5. The regularity of Eq. 9 applies for the distribution. From the theorem it follows that the Bayesian network is a minimal factorization. Bayesian networks not based on a minimal factorization, such as the one of Fig. 2, are always compressible, namely by the regularities among 3 P (D E) is a weighted average of P (D E, C). If one probability P (D E, c 1 ) is different than this average, let s say higher, than there must be at least one value lower than the average, thus different. E C P (D C, E) Table 1. Example of a CPD for which P (D E) = P (D), assuming that P (C = 0) = P (C = 1) = 0.5. the CPDs that follow from the independencies not represented by them. Multiple faithful models can exist for a distribution though. These models represent the same set of independencies and are therefore statistically indistinguishable. They define a Markov-equivalent class. It is proved that they share the same v-structures and only differ in the orientation of the edges [11]. The corresponding factorizations have the same number of conditioning variables, thus all have the same complexity. The observations cannot conclude on the correct model, but we have demarcated a set of closely related models which contains the correct model. 6.2 Equivalence The conditions for causal models, Minimality, Faithfulness and the Causal Markov Condition (section 3.3), are fulfilled for a minimal factorization with random CPDs. Minimality holds by definition, the faithfulness is proved by theorem 4 and the conditional independencies that follow from the Markov condition are present since it is a valid Bayesian network. Finally, the causal interpretation of the edges is correct as long as we define causality in terms of interventions. The modularity of the decomposition captures Pearl s interventions. An intervention, which Pearl considers an atomic operation, can be seen as replacing one specific CPD with a CPD that allows perfect control over the variable (for setting it to a certain state). We hypothesize that the consequences of causal models, like d-separation, the inference and identifiability algorithms, are conform with the CPD decomposition. They solely depends on the CPDs and the variables that link them. Take the flow of information through a causal model. In the model of Fig. 4, variables D and E contain information about A. This information is captured by C. The decrease of uncertainty of A depends on the information that D or E provide about C, but is independent whether the information comes from D or E. C screens A off from D and E, and also D from E. The interaction between the variables happens via C and is represented by the edges. The graphical representation of a causal model suggests that the edges constitute the atomic elements of the model. This can, however, not explain the interaction between A and B. C does not screen A off from B. Moreover, C should be known for having a dependency between A and B. This interaction pattern is captured by taking the CPDs as the atomic elements. We can say that
6 Figure 5. Causal model in which A is independent from D. Figure 4. Example Causal Model. the information travels between the CPDs through the variables. 7 Validity 7.1 Validity of Faithfulness Faithfulness of a causal model is the cornerstone of causal model theory and the accompanying learning algorithms. We showed that a causal model relies on a specific type of regularities, the conditional independencies that follow from the Markov condition. The simplest model should, however, exploit all regularities. There are regularities that a causal model does not capture. If such regularities appear, the minimal Bayesian Network can be either faithful or not. If the model remains faithful, the additional regularities do not interfere with the conditional independencies. They can thus be regarded as regularities of a lower level. A well-known example is when the description of individual CPDs can be further compressed. This regularity is called local structure [1] and appears inside a building block. If the minimal Bayesian Network is unfaithful, the regularities generate independencies not resulting from the Markov condition alone. This does not exclude that the distribution might be described minimally by a causal model augmented with a description of the additional regularities, ie. that the CPD decomposition is still valid. The mostknown example of unfaithfulness is when in the model of Fig. 5 A and D appear to be independent [13]. This happens when the influences along the paths A B D and A C D exactly balance, so that they cancel each other out and the net effect results in an independence. The independence of A and D is, however, not expected by the causal model. The distribution is not typical for the set of distributions that can be described by the model. d-seperation describes the independencies that can be expected from the typical distributions of the causal model set. Distributions with deterministic or functional relations can not be represented by a faithful graph too [13]. In [7] we show that this is related to the violation of the intersection condition, one of the conditions that Pearl imposes on a distribution in the elaboration of causal theory and its algorithms [10]. The solution we proposed in [7] is to incorporate the information about deterministic relations in an augmented causal model, and to extend the d-separation criterion so that it can be used to retrieve all conditional independencies from the model. In such way, the faithfulness of the model can be reestablished, and the model again incorporates all regularities from the data. These examples do not challenge the validity of the causal interpretation of the model. The next chapter focusses on other counterexamples. 7.2 Validity of the CPD Decomposition The CPD decomposition of a joint distribution implies that they represent independent mechanisms. In the model of Fig. 4, CP D D and CP D E are independent, the states of D and E only depend on C. This decomposition is, however, not valid for all systems. For some systems, the CPDs do not represent independent mechanisms. Take the example of particle decay, one of the counterexamples of the Causal Markov Condition reported by [18], p.55, taken from Fraassen (1980, p. 29): Suppose that a particle decays into 2 parts, that conservation of total momentum obtains, and that it is not determined by the prior state of the particle what the momentum of each part will be after the decay. By conservation, the momentum of one part will be determined by the momentum of the other part. By indeterminism, the prior state of the particle will not determine what the momenta of each part will be after the decay. Thus there is no prior screener off. The prior state S fails to screen off the momenta. But by symmetry, neither of the two parts momenta M 1 and M 2 can be the cause of the other. This system cannot be represented by a causal model. The generation of M 1 and M 2 by S should be considered as one (causal) mechanism, as shown in Fig. 6. Some of the other counterexamples of the Causal Markov Condition given in [18] are similar. Take the set of strings of n bits for which m consecutive bits are 1 and the others are 0. For n = 8 and m = 2, and represent valid strings. Every bit can be regarded as a discrete variable. By picking valid strings randomly, the joint distribution is observed. All bits
7 Finally, note that faithfulness can be interpreted in a broader sense as the ability of a model to explain all regularities of the data. References Figure 6. Particle with state S decays into 2 parts with momenta M 1 and M 2. Figure 7. Two models for a pattern in a 8-bit string. are correlated, but each pair becomes independent by conditioning on some other bits. The simplest model for this pattern contains a latent variable, denoting the start position of the non-zero bit sequence. The causal model, shown in Fig. 7(a), however, considers each edge as a separate mechanism. But the mechanisms are not unrelated, the decomposition is not valid. The model fails to represent the many conditional independencies. The model of Fig. 7(b) is more accurate, it indicates that there is one mechanism generating the states of all bits. 8 Conclusions The conditional independencies, on which causal model theory is based, can be regarded as the regularities that allow compression of distributions and the construction of minimal models. We showed that the meaningful information of a causal model lies in its DAG, which defines the decomposition of the distribution into independent submodels, the CPDs. If this decomposition exploits all regularities, causal model theory describes what we can expect from such a system. For example, which conditional independencies appear, or the effect of interventions. In absence of more information, the model offers a good hypothesis about reality. This assumption is supported by the fact that science relies on falsification rather than on confirmation (Popper). One can never prove that an hypothesis is invariably correct, one can only search for observations that refute the hypothesis. [1] C. Boutilier, N. Friedman, M. Goldszmidt, and D. Koller. Context-specific independence in bayesian networks. In Uncertainty in Artificial Intelligence, pages , [2] N. Cartwright. What is wrong with bayes nets? The Monist, pages , [3] T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons, Inc., [4] D. Freedman and P. Humphreys. Are there algorithms that discover causal structure? Synthese, 121:2954, [5] P. Gács, J. Tromp, and P. M. B. Vitányi. Algorithmic statistics. IEEE Trans. Inform. Theory, 47(6): , [6] K. B. Korb and E. Nyberg. The power of intervention. Minds and Machines, 16(3): , [7] J. Lemeire, S. Maes, S. Meganck, and E. Dirkx. The representation and learning of equivalent information in causal models. Technical Report IRIS-TR-0099, Vrije Universiteit Brussel, [8] M. Li and P. M. B. Vitányi. An Introduction to Kolmogorov Complexity and Its Applications. Springer Verlag, [9] R. M. Oliver and J. Q. Smith. Influence Diagrams, Belief Nets and Decision Analysis. Wiley, [10] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA, Morgan Kaufman Publishers, [11] J. Pearl. Causality. Models, Reasoning, and Inference. Cambridge University Press, [12] J. Rissanen. Modeling by shortest data description. Automatica, 14: , [13] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. Springer Verlag, 2nd edition, [14] W. Spohn. Bayesian nets are all there is to causal dependence. In In Stochastic Causality, Maria Carla Galaviotti, Eds. CSLI Lecture Notes, [15] P. M. B. Vitányi. Meaningful information. In P. Bose and P. Morin, editors, ISAAC, volume 2518 of Lecture Notes in Computer Science, pages Springer, [16] C. S. Wallace. Statistical and Inductive Inference by Minimum Message Length. Springer, [17] C. S. Wallace and D. L. Dowe. An information measure for classification. Computer Journal, 11(2): , [18] J. Williamson. Bayesian Nets And Causality: Philosophical And Computational Foundations. Oxford University Press, 2005.
Inferring the Causal Decomposition under the Presence of Deterministic Relations.
Inferring the Causal Decomposition under the Presence of Deterministic Relations. Jan Lemeire 1,2, Stijn Meganck 1,2, Francesco Cartella 1, Tingting Liu 1 and Alexander Statnikov 3 1-ETRO Department, Vrije
More informationAugmented Bayesian Networks for Representing Information Equivalences and Extensions to the PC Learning Algorithm.
Augmented Bayesian Networks for Representing Information Equivalences and Extensions to the PC Learning Algorithm. Jan Lemeire, Sam Maes, Erik Dirkx, Kris Steenhaut, Ann Nowé August 8, 2007 Abstract Data
More informationCausal Inference on Data Containing Deterministic Relations.
Causal Inference on Data Containing Deterministic Relations. Jan Lemeire Kris Steenhaut Sam Maes COMO lab, ETRO Department Vrije Universiteit Brussel, Belgium {jan.lemeire, kris.steenhaut}@vub.ac.be, sammaes@gmail.com
More informationTowards an extension of the PC algorithm to local context-specific independencies detection
Towards an extension of the PC algorithm to local context-specific independencies detection Feb-09-2016 Outline Background: Bayesian Networks The PC algorithm Context-specific independence: from DAGs to
More informationAbstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables
N-1 Experiments Suffice to Determine the Causal Relations Among N Variables Frederick Eberhardt Clark Glymour 1 Richard Scheines Carnegie Mellon University Abstract By combining experimental interventions
More informationDetecting marginal and conditional independencies between events and learning their causal structure.
Detecting marginal and conditional independencies between events and learning their causal structure. Jan Lemeire 1,4, Stijn Meganck 1,3, Albrecht Zimmermann 2, and Thomas Dhollander 3 1 ETRO Department,
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr
More informationLearning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach
Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach Stijn Meganck 1, Philippe Leray 2, and Bernard Manderick 1 1 Vrije Universiteit Brussel, Pleinlaan 2,
More informationLearning Semi-Markovian Causal Models using Experiments
Learning Semi-Markovian Causal Models using Experiments Stijn Meganck 1, Sam Maes 2, Philippe Leray 2 and Bernard Manderick 1 1 CoMo Vrije Universiteit Brussel Brussels, Belgium 2 LITIS INSA Rouen St.
More informationRespecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features Eun Yong Kang Department
More informationCausality in Econometrics (3)
Graphical Causal Models References Causality in Econometrics (3) Alessio Moneta Max Planck Institute of Economics Jena moneta@econ.mpg.de 26 April 2011 GSBC Lecture Friedrich-Schiller-Universität Jena
More informationCISC 876: Kolmogorov Complexity
March 27, 2007 Outline 1 Introduction 2 Definition Incompressibility and Randomness 3 Prefix Complexity Resource-Bounded K-Complexity 4 Incompressibility Method Gödel s Incompleteness Theorem 5 Outline
More information1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution
NETWORK ANALYSIS Lourens Waldorp PROBABILITY AND GRAPHS The objective is to obtain a correspondence between the intuitive pictures (graphs) of variables of interest and the probability distributions of
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationLearning Multivariate Regression Chain Graphs under Faithfulness
Sixth European Workshop on Probabilistic Graphical Models, Granada, Spain, 2012 Learning Multivariate Regression Chain Graphs under Faithfulness Dag Sonntag ADIT, IDA, Linköping University, Sweden dag.sonntag@liu.se
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationArrowhead completeness from minimal conditional independencies
Arrowhead completeness from minimal conditional independencies Tom Claassen, Tom Heskes Radboud University Nijmegen The Netherlands {tomc,tomh}@cs.ru.nl Abstract We present two inference rules, based on
More informationIntelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 13 24 march 2017 Reasoning with Bayesian Networks Naïve Bayesian Systems...2 Example
More informationCS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 2: Directed Graphical Models Theo Rekatsinas 1 Questions Questions? Waiting list Questions on other logistics 2 Section 1 1. Intro to Bayes Nets 3 Section
More informationPDF hosted at the Radboud Repository of the Radboud University Nijmegen
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this
More informationCOMP538: Introduction to Bayesian Networks
COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationOn the Identification of a Class of Linear Models
On the Identification of a Class of Linear Models Jin Tian Department of Computer Science Iowa State University Ames, IA 50011 jtian@cs.iastate.edu Abstract This paper deals with the problem of identifying
More information2 : Directed GMs: Bayesian Networks
10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationA Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models
A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models Y. Xiang University of Guelph, Canada Abstract Discovery of graphical models is NP-hard in general, which justifies using
More informationEquivalence in Non-Recursive Structural Equation Models
Equivalence in Non-Recursive Structural Equation Models Thomas Richardson 1 Philosophy Department, Carnegie-Mellon University Pittsburgh, P 15213, US thomas.richardson@andrew.cmu.edu Introduction In the
More informationDirected Graphical Models or Bayesian Networks
Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact
More informationLocal Characterizations of Causal Bayesian Networks
In M. Croitoru, S. Rudolph, N. Wilson, J. Howse, and O. Corby (Eds.), GKR 2011, LNAI 7205, Berlin Heidelberg: Springer-Verlag, pp. 1-17, 2012. TECHNICAL REPORT R-384 May 2011 Local Characterizations of
More informationLearning causal network structure from multiple (in)dependence models
Learning causal network structure from multiple (in)dependence models Tom Claassen Radboud University, Nijmegen tomc@cs.ru.nl Abstract Tom Heskes Radboud University, Nijmegen tomh@cs.ru.nl We tackle the
More informationDirected Graphical Models
CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential
More informationThe Role of Assumptions in Causal Discovery
The Role of Assumptions in Causal Discovery Marek J. Druzdzel Decision Systems Laboratory, School of Information Sciences and Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15260,
More informationPart I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationDistinguishing Causes from Effects using Nonlinear Acyclic Causal Models
JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University
More informationSupplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges
Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges 1 PRELIMINARIES Two vertices X i and X j are adjacent if there is an edge between them. A path
More informationEE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes Uncertainty & Bayesian Networks
More informationBelief Update in CLG Bayesian Networks With Lazy Propagation
Belief Update in CLG Bayesian Networks With Lazy Propagation Anders L Madsen HUGIN Expert A/S Gasværksvej 5 9000 Aalborg, Denmark Anders.L.Madsen@hugin.com Abstract In recent years Bayesian networks (BNs)
More informationCausal Effect Identification in Alternative Acyclic Directed Mixed Graphs
Proceedings of Machine Learning Research vol 73:21-32, 2017 AMBN 2017 Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs Jose M. Peña Linköping University Linköping (Sweden) jose.m.pena@liu.se
More informationApplying Bayesian networks in the game of Minesweeper
Applying Bayesian networks in the game of Minesweeper Marta Vomlelová Faculty of Mathematics and Physics Charles University in Prague http://kti.mff.cuni.cz/~marta/ Jiří Vomlel Institute of Information
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationAlgorithmic Probability
Algorithmic Probability From Scholarpedia From Scholarpedia, the free peer-reviewed encyclopedia p.19046 Curator: Marcus Hutter, Australian National University Curator: Shane Legg, Dalle Molle Institute
More informationSTAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks
STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Bayesian Networks Representing Joint Probability Distributions 2 n -1 free parameters Reducing Number of Parameters: Conditional Independence
More informationWhat Causality Is (stats for mathematicians)
What Causality Is (stats for mathematicians) Andrew Critch UC Berkeley August 31, 2011 Introduction Foreword: The value of examples With any hard question, it helps to start with simple, concrete versions
More informationIntroduction to Artificial Intelligence. Unit # 11
Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian
More informationMonotone DAG Faithfulness: A Bad Assumption
Monotone DAG Faithfulness: A Bad Assumption David Maxwell Chickering Christopher Meek Technical Report MSR-TR-2003-16 Microsoft Research Microsoft Corporation One Microsoft Way Redmond, WA 98052 Abstract
More informationCausal Inference & Reasoning with Causal Bayesian Networks
Causal Inference & Reasoning with Causal Bayesian Networks Neyman-Rubin Framework Potential Outcome Framework: for each unit k and each treatment i, there is a potential outcome on an attribute U, U ik,
More informationOckham Efficiency Theorem for Randomized Scientific Methods
Ockham Efficiency Theorem for Randomized Scientific Methods Conor Mayo-Wilson and Kevin T. Kelly Department of Philosophy Carnegie Mellon University Formal Epistemology Workshop (FEW) June 19th, 2009 1
More informationEstimation of linear non-gaussian acyclic models for latent factors
Estimation of linear non-gaussian acyclic models for latent factors Shohei Shimizu a Patrik O. Hoyer b Aapo Hyvärinen b,c a The Institute of Scientific and Industrial Research, Osaka University Mihogaoka
More informationOptimism in the Face of Uncertainty Should be Refutable
Optimism in the Face of Uncertainty Should be Refutable Ronald ORTNER Montanuniversität Leoben Department Mathematik und Informationstechnolgie Franz-Josef-Strasse 18, 8700 Leoben, Austria, Phone number:
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationCounterfactual Reasoning in Algorithmic Fairness
Counterfactual Reasoning in Algorithmic Fairness Ricardo Silva University College London and The Alan Turing Institute Joint work with Matt Kusner (Warwick/Turing), Chris Russell (Sussex/Turing), and Joshua
More informationChallenges (& Some Solutions) and Making Connections
Challenges (& Some Solutions) and Making Connections Real-life Search All search algorithm theorems have form: If the world behaves like, then (probability 1) the algorithm will recover the true structure
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationLearning Marginal AMP Chain Graphs under Faithfulness
Learning Marginal AMP Chain Graphs under Faithfulness Jose M. Peña ADIT, IDA, Linköping University, SE-58183 Linköping, Sweden jose.m.pena@liu.se Abstract. Marginal AMP chain graphs are a recently introduced
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim
Course on Bayesian Networks, summer term 2010 0/42 Bayesian Networks Bayesian Networks 11. Structure Learning / Constrained-based Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL)
More informationDistinguishing Causes from Effects using Nonlinear Acyclic Causal Models
Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University of Helsinki 14 Helsinki, Finland kun.zhang@cs.helsinki.fi Aapo Hyvärinen
More informationRepresenting Independence Models with Elementary Triplets
Representing Independence Models with Elementary Triplets Jose M. Peña ADIT, IDA, Linköping University, Sweden jose.m.pena@liu.se Abstract An elementary triplet in an independence model represents a conditional
More informationMeasurement Error and Causal Discovery
Measurement Error and Causal Discovery Richard Scheines & Joseph Ramsey Department of Philosophy Carnegie Mellon University Pittsburgh, PA 15217, USA 1 Introduction Algorithms for causal discovery emerged
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationGraphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence
Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence General overview Introduction Directed acyclic graphs (DAGs) and conditional independence DAGs and causal effects
More informationarxiv: v1 [stat.ml] 15 Nov 2016
Recoverability of Joint Distribution from Missing Data arxiv:1611.04709v1 [stat.ml] 15 Nov 2016 Jin Tian Department of Computer Science Iowa State University Ames, IA 50014 jtian@iastate.edu Abstract A
More informationCS 5522: Artificial Intelligence II
CS 5522: Artificial Intelligence II Bayes Nets: Independence Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]
More informationFaithfulness of Probability Distributions and Graphs
Journal of Machine Learning Research 18 (2017) 1-29 Submitted 5/17; Revised 11/17; Published 12/17 Faithfulness of Probability Distributions and Graphs Kayvan Sadeghi Statistical Laboratory University
More informationConditional probabilities and graphical models
Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within
More informationOutline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012
CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline
More informationGraphical Models - Part I
Graphical Models - Part I Oliver Schulte - CMPT 726 Bishop PRML Ch. 8, some slides from Russell and Norvig AIMA2e Outline Probabilistic Models Bayesian Networks Markov Random Fields Inference Outline Probabilistic
More informationStructure learning in human causal induction
Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use
More informationBayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction
15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationIntervention, determinism, and the causal minimality condition
DOI 10.1007/s11229-010-9751-1 Intervention, determinism, and the causal minimality condition Jiji Zhang Peter Spirtes Received: 12 May 2009 / Accepted: 30 October 2009 Springer Science+Business Media B.V.
More informationBayesian networks. Chapter 14, Sections 1 4
Bayesian networks Chapter 14, Sections 1 4 Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1 4 1 Bayesian networks
More informationInterventions and Causal Inference
Interventions and Causal Inference Frederick Eberhardt 1 and Richard Scheines Department of Philosophy Carnegie Mellon University Abstract The literature on causal discovery has focused on interventions
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationLecture 5: Bayesian Network
Lecture 5: Bayesian Network Topics of this lecture What is a Bayesian network? A simple example Formal definition of BN A slightly difficult example Learning of BN An example of learning Important topics
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationIntroduction to Causal Calculus
Introduction to Causal Calculus Sanna Tyrväinen University of British Columbia August 1, 2017 1 / 1 2 / 1 Bayesian network Bayesian networks are Directed Acyclic Graphs (DAGs) whose nodes represent random
More informationIntegrating Correlated Bayesian Networks Using Maximum Entropy
Applied Mathematical Sciences, Vol. 5, 2011, no. 48, 2361-2371 Integrating Correlated Bayesian Networks Using Maximum Entropy Kenneth D. Jarman Pacific Northwest National Laboratory PO Box 999, MSIN K7-90
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationRecall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem
Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More information4.1 Notation and probability review
Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationCS Lecture 3. More Bayesian Networks
CS 6347 Lecture 3 More Bayesian Networks Recap Last time: Complexity challenges Representing distributions Computing probabilities/doing inference Introduction to Bayesian networks Today: D-separation,
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More informationLecture Notes in Machine Learning Chapter 4: Version space learning
Lecture Notes in Machine Learning Chapter 4: Version space learning Zdravko Markov February 17, 2004 Let us consider an example. We shall use an attribute-value language for both the examples and the hypotheses
More informationMotivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues
Bayesian Networks in Epistemology and Philosophy of Science Lecture 1: Bayesian Networks Center for Logic and Philosophy of Science Tilburg University, The Netherlands Formal Epistemology Course Northern
More informationRecall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.
ecall from last time Lecture 3: onditional independence and graph structure onditional independencies implied by a belief network Independence maps (I-maps) Factorization theorem The Bayes ball algorithm
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationPreliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone
Preliminaries Graphoid Axioms d-separation Wrap-up Much of this material is adapted from Chapter 4 of Darwiche s book January 23, 2014 Preliminaries Graphoid Axioms d-separation Wrap-up 1 Preliminaries
More informationBayesian Networks. Motivation
Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations
More information6.867 Machine learning, lecture 23 (Jaakkola)
Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context
More informationIdentifying Linear Causal Effects
Identifying Linear Causal Effects Jin Tian Department of Computer Science Iowa State University Ames, IA 50011 jtian@cs.iastate.edu Abstract This paper concerns the assessment of linear cause-effect relationships
More informationIntroduction to Bayesian Learning
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline
More informationCS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine
CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows
More informationA Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997
A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science
More informationRecovering Probability Distributions from Missing Data
Proceedings of Machine Learning Research 77:574 589, 2017 ACML 2017 Recovering Probability Distributions from Missing Data Jin Tian Iowa State University jtian@iastate.edu Editors: Yung-Kyun Noh and Min-Ling
More informationSummary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition
Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition Bayesian Networks Two components: 1. Directed Acyclic Graph (DAG) G: There is a node for every variable D: Some nodes
More information