COMP538: Introduction to Bayesian Networks

Size: px
Start display at page:

Download "COMP538: Introduction to Bayesian Networks"

Transcription

1 COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang Department of Computer Science and Engineering Hong Kong University of Science and Technology Spring 2007 Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

2 Introduction A good structural learning algorithm should, among others, Discover the truth provided there is sufficient data Formulation of this intuition: Suppose sufficient data sampled from a true BN model. A good learning algorithm should be able to reconstruct the true model from data. Objective: Assumes (complete) data generated by a BN. Discusses when and how the generating model can be reconstructed from data. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

3 References Chickering, D. M. (1995). A transformational characterization of equivalent Bayesian network structures. In Proc. 11th Conf. on Uncertainty in Artificial Intelligence, Chickering, D. M. (2002). Learning Equivalence Classes of Bayesian-Network Structures. Journal of Machine Learning Research, 2: Chickering, D. M. (2002b). Optimal Structure Identification with Greedy Search. Journal of Machine Learning Research 3: Kocka, T. and Castelo, R. (2001). Improved Learning of Bayesian Networks. In Proc. 17th Conf. on Uncertainty in Artificial Intelligence, Meek, C. (1997). Graphical models: Selecting causal and statistical models. PhD thesis, Carnegie Mellon University. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

4 Outline Model Equivalence 1 Model Equivalence Conditions for Model Equivalence Representing Equivalence Class of Models Model Equivalence and Scoring Functions 2 Model Inclusion Model inclusion and Scoring Functions 3 Optimality Conditions 4 Greedy Equivalence Search (GES) Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

5 Model equivalence Model Equivalence BNs represent joint probabilities. Two different BNs are equivalent if they represent the same joint probability. Equivlence of BN structures Let S and S be two BN models (DAG structures) over variables V. We say S and S are equivalent if for any parameterization θ of S, there exists a parameterization θ of S such that P(V S, θ) = P(V S, θ ), and vice versa In words, S can represent any joint distribution that S can,and vice versa. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

6 Model equivalence Model Equivalence Examples: X Y and X Y are equivalent. (Show this) In a DAG, if we drop directions of all edges, we get its skeleton. Trees with the some skeleton are equivalent. Equivalent models have the same maximized likelihood. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

7 Model Equivalence Conditions for Model Equivalence Model Equivalence and Markov Property Theorem (9.1) (Meek 1995) Two BN models S and S are equivalent iff they imply, by the global Markov property, the same set of conditional independencies. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

8 Model Equivalence Model Equivalence and V-Structures Conditions for Model Equivalence In a DAG, a v-structure is a local pattern X Z Y such that X and Y are not adjacent. A S A S T L B T L B X R D X R D Theorem (9.2) (Verma and Pearl 1991) Two BN models are equivalent iff they have the same skeleton and the same v-structures. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

9 Model Equivalence Model Equivalence and Arc Reversal Conditions for Model Equivalence A S A S T L B T L B X R D X R D In a DAG, an arc X Y is covered if pa(y ) = pa(x) X. Theorem (9.3) (Chickering 1995) Two BN models are equivalent iff there exists a sequence of covered arc reversals which converts one into the other. Example: Arc reversals: A T, S L. There are several other equivalent models. Cannot reverse T R, L R, R D, B D because they are not covered. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

10 PDAG Model Equivalence Representing Equivalence Class of Models cl(s) denotes the class of all BN models equivalent to S. One way to represent the class (Theorem 9.2): Skeleton + v-structures Consisting of undirected as well as directed edges. Called acyclic partially directed graph (PDAG). A X T R L S D B A X T R L S D B A T X R L S D B Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

11 Compelled Edge Model Equivalence Representing Equivalence Class of Models A directed edge in a BN structure S is compelled if it is in all structures equivalent to S. Example: A S T L B X R D By Theorem 9.2, all edges participating in v-structures are compelled. Some edges might also be compelled. Example: R X. Any model with R X is not equivalent to the structure shown (why?) Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

12 Essential Graph Model Equivalence Representing Equivalence Class of Models Essential graph of a BN structure S is a PDAG Whose skeleton is the same as S, and Where the compelled edges and only those edges are directed. Also called DAG pattern (Spirtes et al. 1993), completed PDAG (Chickering 2002), and maximally oriented graphs (Chickering 1995). Used to represent equivalent class during learning Example: A X T R L S D B A T X R L S D B Essential graph of a tree-structured BN model is its skeleton. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

13 Model Equivalence Representing Equivalence Class of Models Computing Essential Graphs Algorithm for computing essential graph of a BN structure S: 1 Compute skeleton of S and orient only edges participating in v-structures. 2 Orient compelled edges: (Note. Cannot create additional v-structure). While more edges can be oriented 1 For each X Z Y such that X and Y are not adjacent, Orient Z Y as Z Y 2 For each X Y such that there is a directed path from X to Y, Orient X Y as X Y 3 For each X Z Y such that X and Y are not adjacent, X W, Y W, and Z W Orient Z W as Z W Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

14 Model Equivalence Computing Essential Graphs Representing Equivalence Class of Models Understanding the rules: Rule (a): If Z Y, we would have an additional v-structure X Z Y. Rule (b): If X Y, we would have a directed cycle. Rule (c): This situation looks like this: X Z Y W If Z W, we would have X Z and Z Y to avoid directed cycle. But this would lead to an additional v-structure. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

15 Model equivalence Model Equivalence Representing Equivalence Class of Models Theorem (9.4) (Meek 1995) Step2 of the algorithm is sound and complete: Notes: (Soundness) Edges oriented by algorithm are compelled edges. And they are oriented correctly. (Completeness) Algorithm orients all compelled edges. The algorithm is important if we want search with essential graphs (Chickering 2002), which we don t in class. However, we will use this algorithm and the results in the next lecture. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

16 Model Equivalence Equivalence-Invariant Scoring Functions Model Equivalence and Scoring Functions A scoring function is equivalence invariant if it gives the same score to equivalent models. Sometimes also said to likelihood equivalent or score equivalent. Equivalence-invariant scoring functions can be used to score equivalence classes. Others cannot. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

17 Model Equivalence Model Equivalence and Scoring Functions BIC Score The BIC score is equivalence invariant: Recall: BIC(S D) = logp(d S, θ ) d 2 logn By definition of equivalence, Equivalent models have the same maximized likelihood. According to Theorem 9.3,equivalent models have the same complexity: Covered arc reversal does not change complexity (prove this). The marginal likelihood (i.e. the CH) score is equivalence invariant if one sets the parameter priors Properly. (Heckerman et al ). 1 Heckerman, D., Geiger, D. and Chickering, D. M. (1994). Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Proc. 10th Conf. Uncertainty in Artificial Intelligence, Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

18 Outline Model Inclusion 1 Model Equivalence Conditions for Model Equivalence Representing Equivalence Class of Models Model Equivalence and Scoring Functions 2 Model Inclusion Model inclusion and Scoring Functions 3 Optimality Conditions 4 Greedy Equivalence Search (GES) Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

19 Model Inclusion Model Inclusion BN model S includes model S if all conditional independence statements valid in S are valid in S as well. Example: Consider models over X, Y, and Z X Y,Z X Y Z All conditional independencies valid in the model on the right are true in the model on the left. The model on the right includes the model on the left. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

20 Model Inclusion Model Inclusion and Equivalence S and S are equivalent iff S includes S and S includes S. We say that S strictly includes S if S includes S but S does not include S. Exercise: Show that X Y Z does not include X Y Z X Y Z does not include X Y Z. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

21 Model Inclusion Model Inclusion Theorem (9.5) (Chickering (2002b) A BN structure S includes S iff there exists a sequence of covered arc reversals and arc additions which converts S into S. Example: Model X Y Z includes model X Y, Z. Corollary (9.1) Covered arc reversal: X Y, Z X Y, Z. Arc addition: X Y, Z X Y Z. A BN structure S includes S iff S can represent any joint distribution that S can. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

22 Inclusion Boundary Model Inclusion Lower inclusion boundary IB (S) of model S consists of all models S such that S strictly includes S. There is no model S such that S strictly includes S and S strictly include S. Upper inclusion boundary IB + (S) of model S consists of all models S such that S strictly includes S. There is no model S such that S strictly includes S and S strictly include S. Inclusion boundary IB(S) = IB (S) IB + (S). Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

23 Inclusion Boundary Model Inclusion According to Theorem 9.5 IB + (S) consists of all models that can be obtained from S via A series of covered arc reversals. Addition of a single arc. Another series of covered arc reversals. IB (S) consists of all models that can be obtained from S via A series of covered arc reversals. Removal of an single arc. Another series of covered arc reversals. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

24 Example of IB Model Inclusion On the next slide, We show all DAGs over three variables. We depict equivalence and inclusion relations. The DAGs are grouped into equivalence classes. Two equivalence classes are adjacent if one can get from one to the other by a single arc addition or deletion. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

25 Model Inclusion Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

26 Useless Edge Model Inclusion Model inclusion and Scoring Functions A fact: Let X and Y be two non-adjacent nodes in a DAG. It is possible to add an edge between X and Y, either X Y or X Y, without creating a directed cycle. (Exercise: Prove this) Consider a joint probability distribution P and a DAG S. Suppose adding an edge X Y to S does not induce cycles. We say that adding the edge to S is useless w.r.t P if X P Y pa S (Y ) Otherwise we say that adding the edge to S is useful w.r.t P. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

27 Model Inclusion Locally Consistent Scoring Functions Model inclusion and Scoring Functions Now let P be the joint probability from which data D were sampled. A scoring function is locally consistent if Adding an edge that is useful w.r.t P to a model increases its score, and Adding an edge that is useless w.r.t P to a model decreases its score. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

28 Model Inclusion BIC is Locally Consistent Model inclusion and Scoring Functions The BIC score is locally consistent when the sample size is sufficiently large. Recall: BIC(S D) = logp(d S, θ ) d 2 logn = N i HˆP(X i pa S (X i )) d 2 logn where ˆP is the empirical distribution. Without losing generality, suppose X 2 / pa(x 1 ) and consider adding edge X 2 X 1 to S. If adding X 2 X 1 to S is useful w.r.t P, It must be also useful w.r.t ˆP when sample is large enough. Hence HˆP (X1 pa S(X 1)) > HˆP (X1 pa S(X 1),X 2) There adding the arc to S increases score when N is large. The difference in the first term increases faster than the difference in the second term. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

29 Model Inclusion Model inclusion and Scoring Functions BIC is Locally Consistent (Continue from previous slide) If adding X 2 X 1 to S is useless w.r.t P, It must be, when sample is large enough, useless w.r.t ˆP except for some random noisy. Hence HˆP (X1 pa S(X 1)) HˆP (X1 pa S(X 1),X 2) There adding the arc to S decreases score when N is large. No difference in first term, but the d in the second term becomes larger. The Bayesian score and the marginal likelihood (CH) score are also locally consistent when sample size is sufficiently large: Asymptotically, they are the same as BIC. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

30 Outline Optimality Conditions 1 Model Equivalence Conditions for Model Equivalence Representing Equivalence Class of Models Model Equivalence and Scoring Functions 2 Model Inclusion Model inclusion and Scoring Functions 3 Optimality Conditions 4 Greedy Equivalence Search (GES) Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

31 Optimality Conditions Optimality Conditions If a DAG S is a perfect-map of joint probability P, we say that P is faithful to S. Theorem (9.6) (Castelo and Kocka 2002, Chickering 2002b) Consider an hill climbing algorithm Alg that uses scoring function f and is based on data D.Suppose 1 D were sampled from a distribution P that is faithful to a BN model S and the sample size is sufficiently large, 2 The scoring function f is equivalence invariant and locally consistent. 3 The models that Alg examines at each step include the following: 1 All models that in the lower inclusion boundary of the current model. 2 All models that can be obtained from the current model by adding a single arc. Then Alg will reach a model in the equivalence class cl(s) and stop there. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

32 Some Notes Optimality Conditions The conditions are strong. They are always violated in practice. Compared to the straightforward hill-climbing neighborhood, we evaluate some neighbors less and some neighbors more. We do not evaluate arc reversals. But we have to consider not only arc removals but the whole lower inclusion boundary. In practice One usually uses the whole inclusion boundary One might even use the arc reversals Empirical evaluations of these possibilities are yet to be done. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

33 Proof of Theorem 9.6 Optimality Conditions Claim 1: Under the conditions of the theorem, if a model S is not equivalent to S, then there must exist another model S that such that either can be obtained from S by adding a single arc or is in the lower boundary of S f (S ) > f (S ) Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

34 Proof of Theorem 9.6 Optimality Conditions Claim 1 and the third condition of Theorem imply If the current model is not equivalent to S, Alg will always find a model that is strictly better than the current model. Since there are only finite many possible models, Alg will reach a model and stop there. The final model must be in the equivalence class cl(s). Otherwise, Alg would continue according to Claim 1. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

35 Proof of Claim 1 Optimality Conditions Two cases: 1 S includes S. 2 S does not include S. Case 1: S includes S. Because of Theorem 9.5, we can reach S from S by a series of covered arc reversals and arc deletions. Let S and S be the models we get before and after the FIRST arc removal respectively. Evidently, S is in the lower boundary of S. Because f is equivalence invariant, we have f (S ) = f ( S) It is also clear that the arc removed from S is useless w.r.t P. Because the scoring function is locally consistent and D is sufficiently large, f (S ) > f ( S) = f (S ) Nevin L. Zhang Hence (HKUST) Claim 1 is true inbayesian this case. Networks Spring / 47

36 Optimality Conditions Proof of Claim 1 Case 2: S does not include S. There must exist two nodes X and Y such that 1 X and Y are not adjacent in S. 2 X and Y are not d-separated by pa S (Y) in S. 3 Adding the arc X Y to S does not induce directed cycles. ( Let S be the resulting model.) (Exercise: prove this) Because P is faithful to S, the second property above implies that, under P, X and Y are not conditionally independent given pa S (Y ). Hence adding the arc X Y to S is useful w.r.t P. Because f is locally consistent, f (S ) > f (S ) Hence Claim 1 is also true in this case. Claim 1 is proved. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

37 Some Notes Optimality Conditions In Theorem 9.6, the requirement on Alg can be relaxed as follows: At each step, Alg finds a model better than the current model if such models exist. This relaxation allows one to consider stochastic hill-climbing (Kocka and Castelo 2002), which is more efficient than standard hill-climber. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

38 Outline Greedy Equivalence Search (GES) 1 Model Equivalence Conditions for Model Equivalence Representing Equivalence Class of Models Model Equivalence and Scoring Functions 2 Model Inclusion Model inclusion and Scoring Functions 3 Optimality Conditions 4 Greedy Equivalence Search (GES) Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

39 Greedy Equivalence Search (GES) Greedy equivalence search (GES) Proposed by Meek (1997) Start with empty model, model with no arcs. Phase I: Repeat the following until a local maximum is reached. Examine all models in the UPPER boundary of the current model. Pick the one with the best score. Phase II: Repeat the following until a local maximum Examine all models in the LOWER boundary of the current model. Pick the one with the best score. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

40 Discussions Greedy Equivalence Search (GES) In the first phase, the algorithm finds a model that includes the true model. In the second phase, it reduces it to the true model. According to Theorem 9.6, one needs only to add arcs in the first phase. Why does the algorithm do more then? At some point it goes through some complex graph, in the worst case the complete graph! With finite data it matters how complex the most complex graph is, it decides if you will end up in a local or global optima. It is generally believed that doing more than just adding edges might help to find some less complex most complex graph. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

41 Greedy Equivalence Search (GES) Implementation Details Generating all models in the inclusion boundary of a model by directly applying Theorem might be computationally expensive: Consider the model with 100 disjoint arcs. There is equivalent DAGs representing the same model, Solution: Use essential graphs (one graph) to represent an equivalence classes of DAGs. See Chickering (2002b) on how to search inclusion boundaries implicitly by using search operators on essential graphs. Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

42 Greedy Equivalence Search (GES) Limitations and Empirical Results Despite the optimality result, local maxima is still a problem: We never have infinite data, and Data usually are not generated by joint distributions faithful to DAGs. The chart (Chickering 2002b) on the next page shows that GES often cannot reconstruct the generative model. But it is much better then hill-climbing with DAGs, D-Space, (the algorithm we described in the previous lecture.) E-space starts for another algorithm based on essential graphs by Chickering (2002). Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

43 Empirical Results Greedy Equivalence Search (GES) Nevin L. Zhang (HKUST) Bayesian Networks Spring / 47

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges 1 PRELIMINARIES Two vertices X i and X j are adjacent if there is an edge between them. A path

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Learning With Bayesian Networks. Markus Kalisch ETH Zürich

Learning With Bayesian Networks. Markus Kalisch ETH Zürich Learning With Bayesian Networks Markus Kalisch ETH Zürich Inference in BNs - Review P(Burglary JohnCalls=TRUE, MaryCalls=TRUE) Exact Inference: P(b j,m) = c Sum e Sum a P(b)P(e)P(a b,e)p(j a)p(m a) Deal

More information

Towards an extension of the PC algorithm to local context-specific independencies detection

Towards an extension of the PC algorithm to local context-specific independencies detection Towards an extension of the PC algorithm to local context-specific independencies detection Feb-09-2016 Outline Background: Bayesian Networks The PC algorithm Context-specific independence: from DAGs to

More information

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables N-1 Experiments Suffice to Determine the Causal Relations Among N Variables Frederick Eberhardt Clark Glymour 1 Richard Scheines Carnegie Mellon University Abstract By combining experimental interventions

More information

Local Structure Discovery in Bayesian Networks

Local Structure Discovery in Bayesian Networks 1 / 45 HELSINGIN YLIOPISTO HELSINGFORS UNIVERSITET UNIVERSITY OF HELSINKI Local Structure Discovery in Bayesian Networks Teppo Niinimäki, Pekka Parviainen August 18, 2012 University of Helsinki Department

More information

An Empirical-Bayes Score for Discrete Bayesian Networks

An Empirical-Bayes Score for Discrete Bayesian Networks An Empirical-Bayes Score for Discrete Bayesian Networks scutari@stats.ox.ac.uk Department of Statistics September 8, 2016 Bayesian Network Structure Learning Learning a BN B = (G, Θ) from a data set D

More information

Who Learns Better Bayesian Network Structures

Who Learns Better Bayesian Network Structures Who Learns Better Bayesian Network Structures Constraint-Based, Score-based or Hybrid Algorithms? Marco Scutari 1 Catharina Elisabeth Graafland 2 José Manuel Gutiérrez 2 1 Department of Statistics, UK

More information

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58 Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian

More information

CS Lecture 3. More Bayesian Networks

CS Lecture 3. More Bayesian Networks CS 6347 Lecture 3 More Bayesian Networks Recap Last time: Complexity challenges Representing distributions Computing probabilities/doing inference Introduction to Bayesian networks Today: D-separation,

More information

Structure Learning: the good, the bad, the ugly

Structure Learning: the good, the bad, the ugly Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform

More information

Beyond Uniform Priors in Bayesian Network Structure Learning

Beyond Uniform Priors in Bayesian Network Structure Learning Beyond Uniform Priors in Bayesian Network Structure Learning (for Discrete Bayesian Networks) scutari@stats.ox.ac.uk Department of Statistics April 5, 2017 Bayesian Network Structure Learning Learning

More information

Learning Marginal AMP Chain Graphs under Faithfulness

Learning Marginal AMP Chain Graphs under Faithfulness Learning Marginal AMP Chain Graphs under Faithfulness Jose M. Peña ADIT, IDA, Linköping University, SE-58183 Linköping, Sweden jose.m.pena@liu.se Abstract. Marginal AMP chain graphs are a recently introduced

More information

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features Eun Yong Kang Department

More information

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang

More information

Learning causal network structure from multiple (in)dependence models

Learning causal network structure from multiple (in)dependence models Learning causal network structure from multiple (in)dependence models Tom Claassen Radboud University, Nijmegen tomc@cs.ru.nl Abstract Tom Heskes Radboud University, Nijmegen tomh@cs.ru.nl We tackle the

More information

Automatic Causal Discovery

Automatic Causal Discovery Automatic Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour Dept. of Philosophy & CALD Carnegie Mellon 1 Outline 1. Motivation 2. Representation 3. Discovery 4. Using Regression for Causal

More information

arxiv: v6 [math.st] 3 Feb 2018

arxiv: v6 [math.st] 3 Feb 2018 Submitted to the Annals of Statistics HIGH-DIMENSIONAL CONSISTENCY IN SCORE-BASED AND HYBRID STRUCTURE LEARNING arxiv:1507.02608v6 [math.st] 3 Feb 2018 By Preetam Nandy,, Alain Hauser and Marloes H. Maathuis,

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Arrowhead completeness from minimal conditional independencies

Arrowhead completeness from minimal conditional independencies Arrowhead completeness from minimal conditional independencies Tom Claassen, Tom Heskes Radboud University Nijmegen The Netherlands {tomc,tomh}@cs.ru.nl Abstract We present two inference rules, based on

More information

An Empirical-Bayes Score for Discrete Bayesian Networks

An Empirical-Bayes Score for Discrete Bayesian Networks JMLR: Workshop and Conference Proceedings vol 52, 438-448, 2016 PGM 2016 An Empirical-Bayes Score for Discrete Bayesian Networks Marco Scutari Department of Statistics University of Oxford Oxford, United

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types

More information

COMP 328: Machine Learning

COMP 328: Machine Learning COMP 328: Machine Learning Lecture 2: Naive Bayes Classifiers Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Spring 2010 Nevin L. Zhang

More information

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation CS 188: Artificial Intelligence Spring 2010 Lecture 15: Bayes Nets II Independence 3/9/2010 Pieter Abbeel UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell, Andrew Moore Current

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian

More information

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination Probabilistic Graphical Models COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline It Introduction ti Representation Bayesian network Conditional Independence Inference:

More information

COMP538: Introduction to Bayesian Networks

COMP538: Introduction to Bayesian Networks COMP538: Introduction to Bayesian Networks Lecture 2: Bayesian Networks Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology Fall

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Lecture 5: Bayesian Network

Lecture 5: Bayesian Network Lecture 5: Bayesian Network Topics of this lecture What is a Bayesian network? A simple example Formal definition of BN A slightly difficult example Learning of BN An example of learning Important topics

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models

A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models Y. Xiang University of Guelph, Canada Abstract Discovery of graphical models is NP-hard in general, which justifies using

More information

Learning Multivariate Regression Chain Graphs under Faithfulness

Learning Multivariate Regression Chain Graphs under Faithfulness Sixth European Workshop on Probabilistic Graphical Models, Granada, Spain, 2012 Learning Multivariate Regression Chain Graphs under Faithfulness Dag Sonntag ADIT, IDA, Linköping University, Sweden dag.sonntag@liu.se

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Robustification of the PC-algorithm for Directed Acyclic Graphs

Robustification of the PC-algorithm for Directed Acyclic Graphs Robustification of the PC-algorithm for Directed Acyclic Graphs Markus Kalisch, Peter Bühlmann May 26, 2008 Abstract The PC-algorithm ([13]) was shown to be a powerful method for estimating the equivalence

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach

Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach Stijn Meganck 1, Philippe Leray 2, and Bernard Manderick 1 1 Vrije Universiteit Brussel, Pleinlaan 2,

More information

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014 10-708: Probabilistic Graphical Models 10-708, Spring 2014 1 : Introduction Lecturer: Eric P. Xing Scribes: Daniel Silva and Calvin McCarter 1 Course Overview In this lecture we introduce the concept of

More information

Estimating High-Dimensional Directed Acyclic Graphs With the PC-Algorithm

Estimating High-Dimensional Directed Acyclic Graphs With the PC-Algorithm Journal of Machine Learning Research? (????)?? Submitted 09/06; Published?? Estimating High-Dimensional Directed Acyclic Graphs With the PC-Algorithm Markus Kalisch Seminar für Statistik ETH Zürich 8092

More information

An Efficient Bayesian Network Structure Learning Algorithm in the Presence of Deterministic Relations

An Efficient Bayesian Network Structure Learning Algorithm in the Presence of Deterministic Relations An Efficient Bayesian Network Structure Learning Algorithm in the Presence of Deterministic Relations Ahmed Mabrouk 1 and Christophe Gonzales 2 and Karine Jabet-Chevalier 1 and Eric Chojnaki 1 Abstract.

More information

Bayesian Networks to design optimal experiments. Davide De March

Bayesian Networks to design optimal experiments. Davide De March Bayesian Networks to design optimal experiments Davide De March davidedemarch@gmail.com 1 Outline evolutionary experimental design in high-dimensional space and costly experimentation the microwell mixture

More information

Identifiability assumptions for directed graphical models with feedback

Identifiability assumptions for directed graphical models with feedback Biometrika, pp. 1 26 C 212 Biometrika Trust Printed in Great Britain Identifiability assumptions for directed graphical models with feedback BY GUNWOONG PARK Department of Statistics, University of Wisconsin-Madison,

More information

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs Proceedings of Machine Learning Research vol 73:21-32, 2017 AMBN 2017 Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs Jose M. Peña Linköping University Linköping (Sweden) jose.m.pena@liu.se

More information

Entropy-based Pruning for Learning Bayesian Networks using BIC

Entropy-based Pruning for Learning Bayesian Networks using BIC Entropy-based Pruning for Learning Bayesian Networks using BIC Cassio P. de Campos Queen s University Belfast, UK arxiv:1707.06194v1 [cs.ai] 19 Jul 2017 Mauro Scanagatta Giorgio Corani Marco Zaffalon Istituto

More information

Learning Semi-Markovian Causal Models using Experiments

Learning Semi-Markovian Causal Models using Experiments Learning Semi-Markovian Causal Models using Experiments Stijn Meganck 1, Sam Maes 2, Philippe Leray 2 and Bernard Manderick 1 1 CoMo Vrije Universiteit Brussel Brussels, Belgium 2 LITIS INSA Rouen St.

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 14: Bayes Nets II Independence 3/9/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 4 Learning Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Another TA: Hongchao Zhou Please fill out the questionnaire about recitations Homework 1 out.

More information

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Learning of Causal Relations

Learning of Causal Relations Learning of Causal Relations John A. Quinn 1 and Joris Mooij 2 and Tom Heskes 2 and Michael Biehl 3 1 Faculty of Computing & IT, Makerere University P.O. Box 7062, Kampala, Uganda 2 Institute for Computing

More information

Learning Bayesian networks

Learning Bayesian networks 1 Lecture topics: Learning Bayesian networks from data maximum likelihood, BIC Bayesian, marginal likelihood Learning Bayesian networks There are two problems we have to solve in order to estimate Bayesian

More information

Markovian Combination of Decomposable Model Structures: MCMoSt

Markovian Combination of Decomposable Model Structures: MCMoSt Markovian Combination 1/45 London Math Society Durham Symposium on Mathematical Aspects of Graphical Models. June 30 - July, 2008 Markovian Combination of Decomposable Model Structures: MCMoSt Sung-Ho

More information

Learning' Probabilis2c' Graphical' Models' BN'Structure' Structure' Learning' Daphne Koller

Learning' Probabilis2c' Graphical' Models' BN'Structure' Structure' Learning' Daphne Koller Probabilis2c' Graphical' Models' Learning' BN'Structure' Structure' Learning' Why Structure Learning To learn model for new queries, when domain expertise is not perfect For structure discovery, when inferring

More information

BN Semantics 3 Now it s personal! Parameter Learning 1

BN Semantics 3 Now it s personal! Parameter Learning 1 Readings: K&F: 3.4, 14.1, 14.2 BN Semantics 3 Now it s personal! Parameter Learning 1 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 1 Building BNs from independence

More information

Branch and Bound for Regular Bayesian Network Structure Learning

Branch and Bound for Regular Bayesian Network Structure Learning Branch and Bound for Regular Bayesian Network Structure Learning Joe Suzuki and Jun Kawahara Osaka University, Japan. j-suzuki@sigmath.es.osaka-u.ac.jp Nara Institute of Science and Technology, Japan.

More information

Introduction to Artificial Intelligence. Unit # 11

Introduction to Artificial Intelligence. Unit # 11 Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

Noisy-OR Models with Latent Confounding

Noisy-OR Models with Latent Confounding Noisy-OR Models with Latent Confounding Antti Hyttinen HIIT & Dept. of Computer Science University of Helsinki Finland Frederick Eberhardt Dept. of Philosophy Washington University in St. Louis Missouri,

More information

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence General overview Introduction Directed acyclic graphs (DAGs) and conditional independence DAGs and causal effects

More information

Directed Graphical Models or Bayesian Networks

Directed Graphical Models or Bayesian Networks Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact

More information

BN Semantics 3 Now it s personal!

BN Semantics 3 Now it s personal! Readings: K&F: 3.3, 3.4 BN Semantics 3 Now it s personal! Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 22 nd, 2008 10-708 Carlos Guestrin 2006-2008 1 Independencies encoded

More information

Model Complexity of Pseudo-independent Models

Model Complexity of Pseudo-independent Models Model Complexity of Pseudo-independent Models Jae-Hyuck Lee and Yang Xiang Department of Computing and Information Science University of Guelph, Guelph, Canada {jaehyuck, yxiang}@cis.uoguelph,ca Abstract

More information

Faithfulness of Probability Distributions and Graphs

Faithfulness of Probability Distributions and Graphs Journal of Machine Learning Research 18 (2017) 1-29 Submitted 5/17; Revised 11/17; Published 12/17 Faithfulness of Probability Distributions and Graphs Kayvan Sadeghi Statistical Laboratory University

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Learning P-maps Param. Learning

Learning P-maps Param. Learning Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition Bayesian Networks Two components: 1. Directed Acyclic Graph (DAG) G: There is a node for every variable D: Some nodes

More information

Learning Bayes Net Structures

Learning Bayes Net Structures Learning Bayes Net Structures KF, Chapter 15 15.5 (RN, Chapter 20) Some material taken from C Guesterin (CMU), K Murphy (UBC) 1 2 Learning Bayes Nets Known Structure Unknown Data Complete Missing Easy

More information

Entropy-based pruning for learning Bayesian networks using BIC

Entropy-based pruning for learning Bayesian networks using BIC Entropy-based pruning for learning Bayesian networks using BIC Cassio P. de Campos a,b,, Mauro Scanagatta c, Giorgio Corani c, Marco Zaffalon c a Utrecht University, The Netherlands b Queen s University

More information

Causality in Econometrics (3)

Causality in Econometrics (3) Graphical Causal Models References Causality in Econometrics (3) Alessio Moneta Max Planck Institute of Economics Jena moneta@econ.mpg.de 26 April 2011 GSBC Lecture Friedrich-Schiller-Universität Jena

More information

Exact distribution theory for belief net responses

Exact distribution theory for belief net responses Exact distribution theory for belief net responses Peter M. Hooper Department of Mathematical and Statistical Sciences University of Alberta Edmonton, Canada, T6G 2G1 hooper@stat.ualberta.ca May 2, 2008

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim Course on Bayesian Networks, summer term 2010 0/42 Bayesian Networks Bayesian Networks 11. Structure Learning / Constrained-based Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL)

More information

Applying Bayesian networks in the game of Minesweeper

Applying Bayesian networks in the game of Minesweeper Applying Bayesian networks in the game of Minesweeper Marta Vomlelová Faculty of Mathematics and Physics Charles University in Prague http://kti.mff.cuni.cz/~marta/ Jiří Vomlel Institute of Information

More information

Structure Learning of Bayesian Networks using Constraints

Structure Learning of Bayesian Networks using Constraints Cassio P. de Campos Dalle Molle Institute for Artificial Intelligence (IDSIA), Galleria 2, Manno 6928, Switzerland Zhi Zeng Qiang Ji Rensselaer Polytechnic Institute (RPI), 110 8th St., Troy NY 12180,

More information

Tutorial: Causal Model Search

Tutorial: Causal Model Search Tutorial: Causal Model Search Richard Scheines Carnegie Mellon University Peter Spirtes, Clark Glymour, Joe Ramsey, others 1 Goals 1) Convey rudiments of graphical causal models 2) Basic working knowledge

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks) (Bayesian Networks) Undirected Graphical Models 2: Use d-separation to read off independencies in a Bayesian network Takes a bit of effort! 1 2 (Markov networks) Use separation to determine independencies

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Causal Structure Learning and Inference: A Selective Review

Causal Structure Learning and Inference: A Selective Review Vol. 11, No. 1, pp. 3-21, 2014 ICAQM 2014 Causal Structure Learning and Inference: A Selective Review Markus Kalisch * and Peter Bühlmann Seminar for Statistics, ETH Zürich, CH-8092 Zürich, Switzerland

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

Causal Reasoning with Ancestral Graphs

Causal Reasoning with Ancestral Graphs Journal of Machine Learning Research 9 (2008) 1437-1474 Submitted 6/07; Revised 2/08; Published 7/08 Causal Reasoning with Ancestral Graphs Jiji Zhang Division of the Humanities and Social Sciences California

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

Elements of Graphical Models DRAFT.

Elements of Graphical Models DRAFT. Steffen L. Lauritzen Elements of Graphical Models DRAFT. Lectures from the XXXVIth International Probability Summer School in Saint-Flour, France, 2006 December 2, 2009 Springer Contents 1 Introduction...................................................

More information

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM Lecture 9 SEM, Statistical Modeling, AI, and Data Mining I. Terminology of SEM Related Concepts: Causal Modeling Path Analysis Structural Equation Modeling Latent variables (Factors measurable, but thru

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Discovery of Pseudo-Independent Models from Data

Discovery of Pseudo-Independent Models from Data Discovery of Pseudo-Independent Models from Data Yang Xiang, University of Guelph, Canada June 4, 2004 INTRODUCTION Graphical models such as Bayesian networks (BNs) and decomposable Markov networks (DMNs)

More information

Causal Inference & Reasoning with Causal Bayesian Networks

Causal Inference & Reasoning with Causal Bayesian Networks Causal Inference & Reasoning with Causal Bayesian Networks Neyman-Rubin Framework Potential Outcome Framework: for each unit k and each treatment i, there is a potential outcome on an attribute U, U ik,

More information

Capturing Independence Graphically; Undirected Graphs

Capturing Independence Graphically; Undirected Graphs Capturing Independence Graphically; Undirected Graphs COMPSCI 276, Spring 2014 Set 2: Rina Dechter (Reading: Pearl chapters 3, Darwiche chapter 4) 1 Constraint Networks Example: map coloring Variables

More information

Conditional Independence and Factorization

Conditional Independence and Factorization Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Learning Bayesian Networks Does Not Have to Be NP-Hard

Learning Bayesian Networks Does Not Have to Be NP-Hard Learning Bayesian Networks Does Not Have to Be NP-Hard Norbert Dojer Institute of Informatics, Warsaw University, Banacha, 0-097 Warszawa, Poland dojer@mimuw.edu.pl Abstract. We propose an algorithm for

More information

The TETRAD Project: Constraint Based Aids to Causal Model Specification

The TETRAD Project: Constraint Based Aids to Causal Model Specification The TETRAD Project: Constraint Based Aids to Causal Model Specification Richard Scheines, Peter Spirtes, Clark Glymour, Christopher Meek and Thomas Richardson Department of Philosophy, Carnegie Mellon

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

When Discriminative Learning of Bayesian Network Parameters Is Easy

When Discriminative Learning of Bayesian Network Parameters Is Easy Pp. 491 496 in Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI 2003), edited by G. Gottlob and T. Walsh. Morgan Kaufmann, 2003. When Discriminative Learning

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information