Parameter Learning: Binary Variables
|
|
- Evan Watson
- 5 years ago
- Views:
Transcription
1 Parameter Learning: Binary Variables SS 008 Bayesian Networks Multimedia Computing, Universität Augsburg
2 Reference Richard E. Neapolitan. Learning Bayesian Networks. Prentice Hall Series in Artificial Intelligence, ISBN Chapter on Parameter Learning: Binary Variables (chapter 6) Figures and text are taken from that book Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany;
3 Augmented Bayesian Network () Goal: Extending theory from learning one to many parameters in BN Definition 6.8: An augmented Bayesian network G, F, is a Bayesian network determined by the following:. A DAG G= V, E where V ={ X,..., X n } and each X i is a random variable.. For every i, an auxiliary parent variable Fi of X i and a density function i of Fi. Each Fi is a root and has no edge to any variable except X i. The set of all Fi is denoted by F. That is F =F F... F n. 3. For every i, for all values pa i of the parents PA i in V of X i, and all values f i of F i, a probability distribution of X i conditional on pa i and f i. An augmented BN is simply a BN. The notation is the only difference! => Encodes our beliefs concerning the unknown conditional relative frequencies (parameters) needed for the DAG Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 3
4 Augmented Bayesian Network () F i is a set of random variable representing our belief concerning the relative frequencies of the values of X i given values of the parents of X i. Since all F i s are root nodes, they are mutually independent. Therefore, we have global parameter independence: f, f,..., f n = f f n f n Theorem 6.6: Let an augmented Bayesian network G, F, be given. Then the marginal distribution of P of { X,..., X n } constitutes a Bayesian network with G. We say G, F, embeds G, P Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 4
5 Augmented Bayesian Network (3) Definition 6.9: A binomial augmented Bayesian network G, F, is an augmented Bayesian network with the following properties:. For every i, X i has space {,}.. For every i, there is an ordering [ pa i, pa i,..., paiq ] of all instantiations of the parents PAi in V of X i, where q i is the number of different instantiations of these parents. Furthermore, for every i F i={ F i, F i,..., F iq } where each F ij is a root, has no edge to any variable except X i, and has density function ij f ij 0 f ij 3. For every i and j, and all values of f i={ f i,..., f ij,..., f iq } of F i, P X i = paij, f i,..., f ij,..., f iq = f ij i i i i Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de
6 Example: Figure Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; 6
7 Augmented Bayesian Network (4) F ij is a random variable whose probability distribution represents our belief concerning the relative frequency with which X i is equal to given that the parents of X i are in their jth instantiation. F ij are all roots in the BN => they are mutually independent => local parameter independence of their members F ij f i, f i,..., f iq = f i f i f iq for i n i i Together with global and local parameter independence we get: f, f,..., f nq = f f f nq n n Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 7
8 Augmented Bayesian Network () Theorem 6.7: Let a binomial augmented Bayesian network G, F, be given. Then for each i and each j, the ij-th conditional distribution in the embedded Bayesian network G, P is given by P X i = paij =E F ij Corollary 6.: Let a binomial augmented Bayesian network be given. If each F ij has a beta distribution with parameters a ij, bij, N ij =a ij bij, then for each i and each j the ij-th conditional distribution in the embedded network G, P is given by aij P X i = paij = N ij Note: Inference is always done in the embedded BN using only the variables in V Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 8
9 Bayesian Network Sample Definition 6.0: Suppose we have a sample of size M as follows:. We have the random vectors X X M M X =,..., X = with D= { X,..., X M } M Xn Xn such that, for every i, each X h has the same space. i. There is an augmented Bayesian network G, F,, where G= V, E, such that, for h M { X h,..., X nh } constitutes an instance of V in G resulting in a distinct augmented Bayesian network. Then the sample D is called a Bayesian network sample of size M with parameter G, F. Definition 6.: Suppose we have a Bayesian network sample of size M such that. for every i, each X i h has the space {,}. its augmented Bayesian network G, F, is binomial. Then the sample D is called a binomial Bayesian network sample of size M with parameter G, F Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 9
10 P( d f,..., fn) Lemma 6.8: Suppose. D is a Bayesian network sample of size M with parameter G, F ; Lemma 6.0: Suppose. D is a binomial Bayesian network sample of size M with parameter G, F ;. we have a set of values (data) of the X h as follows: x x M M x =,..., x = with d ={ X,..., X M } x n x M n Then Then n M P d f,..., f n = P x pa, f i, i= h= h i h i h i where pa contains the values of the parents of X i in the h-th case. P d f,..., f n =P d f,..., f n q n n M n qi = P x ih pa i h, f i = f ij s f ij t ij i= h= ij i= j = h where M ij is the number of x s in which X i h 's parents are in their j-th instantiation, and of these M ij cases, sij is the number in which x i h is equal to and t ij is the number in which it equals Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 0
11 Posterior Global Parameter Independence Theorem 6.8: Suppose we have the conditions in Lemma 6.8. Then n P d = i = M P x h i f i h= h i pa, f i f i df i Theorem 6.: Suppose we have the conditions in Lemma 6.0. Then n qi P d = E F ij [ F ij ] sij tij i = j= Suppose we have the conditions in Lemma 6.0 and each F ij has a beta distribution with the parameters a ij, bij, N ij =a ij bij. Then q n N ij a ij sij bij t ij P d = aij bij i = j= N ij M ij i Theorem 6.9: (Posterior Global Parameter Independence) Suppose we have the conditions in Lemma 6.8. Then the F i s are mutually independent conditional on D. That is, Theorem 6.: (Posterior Global Parameter Independence) Suppose we have the conditions in Lemma 6.0. Then the F ij s are mutally independent conditional on D. That is, n f,..., f n d = f i d i = qi f, f,..., f nq d = f ij d n n i= j = with sij f ij d = t ij f ij f ij f ij s ij t ij E F ij [ F ij ] If f ij =beta f ij ; a ij, bij then f ij d =beta f ij ; aij s ij, bij t ij Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de
12 Theorem 6.0 Definition 6.: The augmented Bayesian network G, F, d with d denoting f,..., f n d is called the updated augmented Bayesian network relative to the Bayesian network sample and the data d. The network it embeds is called the updated embedded Bayesian network relative to the Bayesian network sample and the data d. Theorem 6.0: Suppose the conditions in Lemma 6.8 hold, and we create a Bayesian network sample of size M+ by including another random vector X M M X = M Xn Then if D is the Bayesian network sample of size M, the updated distribution P x M,..., x nm d is the probability distribution in the updated embedded Bayesian network Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de
13 Example 6.0 () Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; 3
14 Example 6.0 () Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; 4
15 Example 6.0 (3) Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany;
16 Example 6.0 (4) Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; 6
17 Sample Size Problem () weird? P X = = Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de =.09 7
18 Sample Size Problem () P X = = 3 = The two graphs are equivalent, how can the results be different? Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 8
19 Equivalent Sample Size () P X = = Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de =. 9
20 Equivalent Sample Size () P X = = Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 3 = 0
21 Equivalent Sample Size (3) The idea is that we specify values for a ij and bij that can actually occur in a sample exhibiting the conditional independencies entailed by the DAG Definition 6.3: Suppose we have a binomial augmented Bayesian network in which the density functions are beta f ij ; aij,b ij for all i and j. If there is a number N such that, for all i and j N ij =aij bij =P pa ij N then the network is said to have equivalent sample size N. In case of a root, PA i is empty and q i= ( P pa i = ). Theorem 6.3: Suppose we specify G,F, and N and assign for all i and j N a ij =b ij = qi Then the resultant augmented Bayesian network has equivalent sample size N, and the probability distribution in the resultant embedded BN is uniform. Theorem 6.4: Suppose we specify G,F, and N, a BN (G,P), and assign for all i and j a ij =P X i= pa ij P paij N bij = P X i= pa ij P paij N Then the resultant augmented BN has equivalent sample size N, and it embeds the originally specified BN Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de
22 Equivalent Sample Size (4) Definition 6.4: Binomial augmented BNs G, F G, G and G, F G, G are called equivalent if they satisfy the following:. G and G are Markov equivalent.. The probability distributions in their embedded BNs are the same. 3. The specified density functions in both are beta. 4. They have the same equivalent sample size. Theorem 6.: Suppose we have two equivalent binomial augmented Bayesian networks G G G, F, G and G, F, G. Let D be a set of random vectors as specified in Definition 6.. Then, given any set d of values of the vectors in D, the updated embedded Bayesian network relative to D and the data d, obtained by considering D a binomial BN sample with parameter G, F G, contains the same probability distribution as the one obtained by considering D a binomial BN sample with parameter G, F G. => As long as we use an equivalent sample size, our updated probability distribution does not depend on which equivalent DAG we use to represent a set of conditional independencies Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de
23 Expressing Prior Indifference Use N= root nodes => beta(f;,) Other nodes with qi different combinations of parent values => beta(f;/qi, /qi) In this way, the total sample size at each node is always Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; 3
24 Learning with Randomly Missing Data SS 008 Bayesian Networks Multimedia Computing, Universität Augsburg 4
25 Learning with Data beta f ;, beta f ;, F beta f ;, F X F X Case 3 4 X X beta f ;6,3 X P X = =/ X P X = X = = / P X = X = =/ s=4 t = s =3 t = s =0 t = F beta f ;4, beta f ;, F F X X X X P X = =/3 P X = X = =/3 P X = X = =/ Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de
26 Learning with Missing Data Assumption: Data items are missing randomly beta f ;, beta f ;, F beta f ;, F X F X Case 3 4 X X?? beta f ; 6,3 X P X = = / X P X = X = = / P X = X = =/ Case X X # Occurrence 0, 0, 3 4 0, 0, s ' =. t ' =. s ' =0. t ' =0. F beta f ;3 /,3/ beta f ;7 /,/ F F X X X X P X = =/3 P X = X =7/ P X = X =/ Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 6
27 What We Computed Set f ' ={ f ', f ', f ' }={ f, f, f }= f ={ /, /,/ } s ' =E s d, f ' = = P X h =, X h = d, f ' h= P X h =, X h = x h, f ' h= = P X h =, X h = x h, x h, f ' h= = 0 0= 3 t ' =E t d, f ' = 0 0 0= ==> Estimates are based only on the 'data' in our prior sample. They are not based on the data d Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 7
28 Recall beta f ;6,3 beta f ;7 /,/ beta f ;3 /,3/ F F F X X X X P X = =/3 P X = X =7/ P X = X =/ Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 8
29 What We Should Computed Incorporate the data d in our estimates: f ' ={ f ', f ', f ' }= { /3,7 /, / } s ' =E s d, f ' = = P X h =, X h = d, f ' h= P X h =, X h = x h, f ' h= = P X h =, X h = x h, x h, f ' h= = t ' =E t d, f ' = 0 0 0= => Under certain conditions the limit (repeat procedure) that is approached by f' is a value of f that locally maximizes rho(f d) = Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 9
30 Recall beta f ; 6,3 beta f ; F beta f ;,, F X F X Case 3 4 X X?? beta f ; 6,3 X X P X = =/3 P X = X = =7 / P X = X = = / Case X X # Occurrence 7/ / 3 4 0, 0, 7 s ' = t ' = s ' =0. t ' =0. F beta f ; , beta f ;, F F X X X X P X = =/3 P X = X = =3/ 48 P X = X = = / Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 30
31 Repeat Update beta f ; 6,3 beta f ; F 43 9, beta f ; F X 3 3, F X Case 3 4 X X?? beta f ; 6,3 X X P X = =/3 P X = X = =43/7 P X = X = = / Case X X # Occurrence 43 / 7 9 / , 0, 43 s ' = 7 9 t ' = 7 s ' =0. t ' =0. F beta f ; , 7 beta f ;, F F X X X X P X = =/3 P X = X = =? P X = X = =/ Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 3
32 Algorithm 6. () Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; 3
33 Algorithm 6. () Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; 33
Parameter Learning With Binary Variables
With Binary Variables University of Nebraska Lincoln CSCE 970 Pattern Recognition Outline Outline 1 Learning a Single Parameter 2 More on the Beta Density Function 3 Computing a Probability Interval Outline
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationDirected Graphical Models
CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential
More informationLEARNING WITH BAYESIAN NETWORKS
LEARNING WITH BAYESIAN NETWORKS Author: David Heckerman Presented by: Dilan Kiley Adapted from slides by: Yan Zhang - 2006, Jeremy Gould 2013, Chip Galusha -2014 Jeremy Gould 2013Chip Galus May 6th, 2016
More informationBN Semantics 3 Now it s personal! Parameter Learning 1
Readings: K&F: 3.4, 14.1, 14.2 BN Semantics 3 Now it s personal! Parameter Learning 1 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 1 Building BNs from independence
More informationReadings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008
Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 5 Bayesian Learning of Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Recitations: Every Tuesday 4-5:30 in 243 Annenberg Homework 1 out. Due in class
More informationCS Lecture 3. More Bayesian Networks
CS 6347 Lecture 3 More Bayesian Networks Recap Last time: Complexity challenges Representing distributions Computing probabilities/doing inference Introduction to Bayesian networks Today: D-separation,
More informationBayesian Networks. Marcello Cirillo PhD Student 11 th of May, 2007
Bayesian Networks Marcello Cirillo PhD Student 11 th of May, 2007 Outline General Overview Full Joint Distributions Bayes' Rule Bayesian Network An Example Network Everything has a cost Learning with Bayesian
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationLearning P-maps Param. Learning
Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationK. Nishijima. Definition and use of Bayesian probabilistic networks 1/32
The Probabilistic Analysis of Systems in Engineering 1/32 Bayesian probabilistic bili networks Definition and use of Bayesian probabilistic networks K. Nishijima nishijima@ibk.baug.ethz.ch 2/32 Today s
More informationDecision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks
Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks Alessandro Antonucci Marco Zaffalon IDSIA, Istituto Dalle Molle di Studi sull
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationBayesian Networks. Motivation
Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations
More informationGraphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence
Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence General overview Introduction Directed acyclic graphs (DAGs) and conditional independence DAGs and causal effects
More informationOverview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian
More informationMachine Learning in Simple Networks. Lars Kai Hansen
Machine Learning in Simple Networs Lars Kai Hansen www.imm.dtu.d/~lh Outline Communities and lin prediction Modularity Modularity as a combinatorial optimization problem Gibbs sampling Detection threshold
More informationStructure Learning: the good, the bad, the ugly
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationIntroduction to Bayesian Learning
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline
More informationIndependencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)
(Bayesian Networks) Undirected Graphical Models 2: Use d-separation to read off independencies in a Bayesian network Takes a bit of effort! 1 2 (Markov networks) Use separation to determine independencies
More informationMotivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues
Bayesian Networks in Epistemology and Philosophy of Science Lecture 1: Bayesian Networks Center for Logic and Philosophy of Science Tilburg University, The Netherlands Formal Epistemology Course Northern
More informationPart I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a
More informationInference in Bayesian Networks
Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationTDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2
TDT70: Uncertainty in Artificial Intelligence Chapter 1 and 2 Fundamentals of probability theory The sample space is the set of possible outcomes of an experiment. A subset of a sample space is called
More informationLecture 5: Bayesian Network
Lecture 5: Bayesian Network Topics of this lecture What is a Bayesian network? A simple example Formal definition of BN A slightly difficult example Learning of BN An example of learning Important topics
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationCOMP538: Introduction to Bayesian Networks
COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology
More information{ p if x = 1 1 p if x = 0
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationIllustration of the K2 Algorithm for Learning Bayes Net Structures
Illustration of the K2 Algorithm for Learning Bayes Net Structures Prof. Carolina Ruiz Department of Computer Science, WPI ruiz@cs.wpi.edu http://www.cs.wpi.edu/ ruiz The purpose of this handout is to
More informationArtificial Intelligence: Cognitive Agents
Artificial Intelligence: Cognitive Agents AI, Uncertainty & Bayesian Networks 2015-03-10 / 03-12 Kim, Byoung-Hee Biointelligence Laboratory Seoul National University http://bi.snu.ac.kr A Bayesian network
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLearning Bayesian Networks (part 1) Goals for the lecture
Learning Bayesian Networks (part 1) Mark Craven and David Page Computer Scices 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some ohe slides in these lectures have been adapted/borrowed from materials
More informationEstimation and Control of the False Discovery Rate of Bayesian Network Skeleton Identification
Estimation and Control of the False Discovery Rate of Bayesian Network Skeleton Identification Angelos P. Armen and Ioannis Tsamardinos Computer Science Department, University of Crete, and Institute of
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationLearning With Bayesian Networks. Markus Kalisch ETH Zürich
Learning With Bayesian Networks Markus Kalisch ETH Zürich Inference in BNs - Review P(Burglary JohnCalls=TRUE, MaryCalls=TRUE) Exact Inference: P(b j,m) = c Sum e Sum a P(b)P(e)P(a b,e)p(j a)p(m a) Deal
More informationThe Monte Carlo Method: Bayesian Networks
The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks
More informationMolecular Similarity Searching Using Inference Network
Molecular Similarity Searching Using Inference Network Ammar Abdo, Naomie Salim* Faculty of Computer Science & Information Systems Universiti Teknologi Malaysia Molecular Similarity Searching Search for
More informationLearning Bayesian networks
1 Lecture topics: Learning Bayesian networks from data maximum likelihood, BIC Bayesian, marginal likelihood Learning Bayesian networks There are two problems we have to solve in order to estimate Bayesian
More informationUniversität Augsburg
Universität Augsburg Properties of Overwriting for Updates in Typed Kleene Algebras Thorsten Ehm Report 2000-7 Dezember 2000 Institut für Informatik D-86135 Augsburg Copyright c Thorsten Ehm Institut für
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationLearning latent structure in complex networks
Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann
More informationCS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationGeneralized Loopy 2U: A New Algorithm for Approximate Inference in Credal Networks
Generalized Loopy 2U: A New Algorithm for Approximate Inference in Credal Networks Alessandro Antonucci, Marco Zaffalon, Yi Sun, Cassio P. de Campos Istituto Dalle Molle di Studi sull Intelligenza Artificiale
More informationDiscovery of Pseudo-Independent Models from Data
Discovery of Pseudo-Independent Models from Data Yang Xiang, University of Guelph, Canada June 4, 2004 INTRODUCTION Graphical models such as Bayesian networks (BNs) and decomposable Markov networks (DMNs)
More informationMachine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL
Machine learning: lecture 20 ommi. Jaakkola MI CAI tommi@csail.mit.edu opics Representation and graphical models examples Bayesian networks examples, specification graphs and independence associated distribution
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationOn the errors introduced by the naive Bayes independence assumption
On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More information6.867 Machine learning, lecture 23 (Jaakkola)
Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim
Course on Bayesian Networks, summer term 2010 0/42 Bayesian Networks Bayesian Networks 11. Structure Learning / Constrained-based Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL)
More informationHidden Markov Models (recap BNs)
Probabilistic reasoning over time - Hidden Markov Models (recap BNs) Applied artificial intelligence (EDA132) Lecture 10 2016-02-17 Elin A. Topp Material based on course book, chapter 15 1 A robot s view
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationBayesian network modeling. 1
Bayesian network modeling http://springuniversity.bc3research.org/ 1 Probabilistic vs. deterministic modeling approaches Probabilistic Explanatory power (e.g., r 2 ) Explanation why Based on inductive
More informationRelated Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM
Lecture 9 SEM, Statistical Modeling, AI, and Data Mining I. Terminology of SEM Related Concepts: Causal Modeling Path Analysis Structural Equation Modeling Latent variables (Factors measurable, but thru
More informationAnnouncements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic
CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationBayesian Networks Basic and simple graphs
Bayesian Networks Basic and simple graphs Ullrika Sahlin, Centre of Environmental and Climate Research Lund University, Sweden Ullrika.Sahlin@cec.lu.se http://www.cec.lu.se/ullrika-sahlin Bayesian [Belief]
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 4 Learning Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Another TA: Hongchao Zhou Please fill out the questionnaire about recitations Homework 1 out.
More informationFrom Distributions to Markov Networks. Sargur Srihari
From Distributions to Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics The task: How to encode independencies in given distribution P in a graph structure G Theorems concerning What type of Independencies?
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationIntroduction to Artificial Intelligence. Unit # 11
Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian
More informationIntroduction. next up previous Next: Problem Description Up: Bayesian Networks Previous: Bayesian Networks
Next: Problem Description Up: Bayesian Networks Previous: Bayesian Networks Introduction Autonomous agents often find themselves facing a great deal of uncertainty. It decides how to act based on its perceived
More informationCS 188: Artificial Intelligence. Bayes Nets
CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew
More informationMarkovian Combination of Decomposable Model Structures: MCMoSt
Markovian Combination 1/45 London Math Society Durham Symposium on Mathematical Aspects of Graphical Models. June 30 - July, 2008 Markovian Combination of Decomposable Model Structures: MCMoSt Sung-Ho
More informationDirected Graphical Models
Directed Graphical Models Instructor: Alan Ritter Many Slides from Tom Mitchell Graphical Models Key Idea: Conditional independence assumptions useful but Naïve Bayes is extreme! Graphical models express
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationEvents A and B are independent P(A) = P(A B) = P(A B) / P(B)
Events A and B are independent A B U P(A) = P(A B) = P(A B) / P(B) 1 Alternative Characterization of Independence a) P(A B) = P(A) b) P(A B) = P(A) P(B) Recall P(A B) = P(A B) / P(B) (if P(B) 0) So P(A
More informationThe Random Variable for Probabilities Chris Piech CS109, Stanford University
The Random Variable for Probabilities Chris Piech CS109, Stanford University Assignment Grades 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Frequency Frequency 10 20 30 40 50 60 70 80
More informationMachine Learning Lecture 14
Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de
More informationTowards an extension of the PC algorithm to local context-specific independencies detection
Towards an extension of the PC algorithm to local context-specific independencies detection Feb-09-2016 Outline Background: Bayesian Networks The PC algorithm Context-specific independence: from DAGs to
More informationApplying Bayesian networks in the game of Minesweeper
Applying Bayesian networks in the game of Minesweeper Marta Vomlelová Faculty of Mathematics and Physics Charles University in Prague http://kti.mff.cuni.cz/~marta/ Jiří Vomlel Institute of Information
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationBeyond Uniform Priors in Bayesian Network Structure Learning
Beyond Uniform Priors in Bayesian Network Structure Learning (for Discrete Bayesian Networks) scutari@stats.ox.ac.uk Department of Statistics April 5, 2017 Bayesian Network Structure Learning Learning
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationDirected Graphical Models or Bayesian Networks
Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationModel Averaging (Bayesian Learning)
Model Averaging (Bayesian Learning) We want to predict the output Y of a new case that has input X = x given the training examples e: p(y x e) = m M P(Y m x e) = m M P(Y m x e)p(m x e) = m M P(Y m x)p(m
More informationExact distribution theory for belief net responses
Exact distribution theory for belief net responses Peter M. Hooper Department of Mathematical and Statistical Sciences University of Alberta Edmonton, Canada, T6G 2G1 hooper@stat.ualberta.ca May 2, 2008
More informationData Mining 2018 Bayesian Networks (1)
Data Mining 2018 Bayesian Networks (1) Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 49 Do you like noodles? Do you like noodles? Race Gender Yes No Black Male 10
More informationAn Introduction to Reversible Jump MCMC for Bayesian Networks, with Application
An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application, CleverSet, Inc. STARMAP/DAMARS Conference Page 1 The research described in this presentation has been funded by the U.S.
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationLearning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University
Learning from Sensor Data: Set II Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University 1 6. Data Representation The approach for learning from data Probabilistic
More informationCS 188: Artificial Intelligence Fall Recap: Inference Example
CS 188: Artificial Intelligence Fall 2007 Lecture 19: Decision Diagrams 11/01/2007 Dan Klein UC Berkeley Recap: Inference Example Find P( F=bad) Restrict all factors P() P(F=bad ) P() 0.7 0.3 eather 0.7
More informationCS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 2: Directed Graphical Models Theo Rekatsinas 1 Questions Questions? Waiting list Questions on other logistics 2 Section 1 1. Intro to Bayes Nets 3 Section
More informationAn Empirical-Bayes Score for Discrete Bayesian Networks
An Empirical-Bayes Score for Discrete Bayesian Networks scutari@stats.ox.ac.uk Department of Statistics September 8, 2016 Bayesian Network Structure Learning Learning a BN B = (G, Θ) from a data set D
More informationSampling Algorithms for Probabilistic Graphical models
Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir
More information