Probabilistic Graphical Models Redes Bayesianas: Definição e Propriedades Básicas

Similar documents
Directed Graphical Models or Bayesian Networks

Rapid Introduction to Machine Learning/ Deep Learning

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Bayes Nets: Independence

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Bayesian Networks BY: MOHAMAD ALSABBAGH

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

CS Lecture 3. More Bayesian Networks

Bayesian networks. Independence. Bayesian networks. Markov conditions Inference. by enumeration rejection sampling Gibbs sampler

Quantifying uncertainty & Bayesian networks

Sampling Algorithms for Probabilistic Graphical models

CS 188: Artificial Intelligence. Bayes Nets

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Intro to AI: Lecture 8. Volker Sorge. Introduction. A Bayesian Network. Inference in. Bayesian Networks. Bayesian Networks.

Learning in Bayesian Networks

CS 5522: Artificial Intelligence II

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Directed Graphical Models

Probabilistic Graphical Networks: Definitions and Basic Results

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Directed and Undirected Graphical Models

Bayesian networks. Instructor: Vincent Conitzer

Introduction to Bayesian Learning

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

2 : Directed GMs: Bayesian Networks

Probabilistic Graphical Models (I)

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Bayesian Networks. Exact Inference by Variable Elimination. Emma Rollon and Javier Larrosa Q

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Being Bayesian About Network Structure:

Probabilistic Models

Introduction to Causal Calculus

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

Bayesian networks (1) Lirong Xia

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

Introduction to Bayesian Networks

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Events A and B are independent P(A) = P(A B) = P(A B) / P(B)

Chapter 16. Structured Probabilistic Models for Deep Learning

Graphical Models and Kernel Methods

Probabilistic Reasoning. (Mostly using Bayesian Networks)

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Artificial Intelligence Bayes Nets: Independence

Data Mining 2018 Bayesian Networks (1)

Bayesian Networks. Motivation

Lecture 6: Graphical Models

CS 5522: Artificial Intelligence II

The Monte Carlo Method: Bayesian Networks

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Intelligent Systems (AI-2)

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Exercises, II part Exercises, II part

Conditional Independence and Factorization

Introduction to Bayesian Networks. Probabilistic Models, Spring 2009 Petri Myllymäki, University of Helsinki 1

CS 343: Artificial Intelligence

Bayesian Network Representation

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

An Introduction to Bayesian Machine Learning

Bayesian Network Inference Using Marginal Trees

Bayesian Networks: Representation, Variable Elimination

Introduction to Artificial Intelligence. Unit # 11

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CSE 473: Artificial Intelligence Autumn 2011

2 : Directed GMs: Bayesian Networks

CS 188: Artificial Intelligence Spring Announcements

Reasoning Under Uncertainty: Belief Network Inference

Machine Learning Lecture 14

Today. Why do we need Bayesian networks (Bayes nets)? Factoring the joint probability. Conditional independence

The Origin of Deep Learning. Lili Mou Jan, 2015

Learning Bayesian Networks (part 1) Goals for the lecture

Introduction to Bayesian Networks

Intelligent Systems (AI-2)

Bayesian Approaches Data Mining Selected Technique

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

CMPSCI 240: Reasoning about Uncertainty

Lecture 8: Bayesian Networks

Undirected Graphical Models: Markov Random Fields

Probabilistic Models. Models describe how (a portion of) the world works

CS 188: Artificial Intelligence Fall 2009

COMP5211 Lecture Note on Reasoning under Uncertainty

{ p if x = 1 1 p if x = 0

Causal Inference & Reasoning with Causal Bayesian Networks

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

Stephen Scott.

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Bayesian Networks to design optimal experiments. Davide De March

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Uncertainty and Bayesian Networks

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

PROBABILISTIC REASONING SYSTEMS

Causal Bayesian networks. Peter Antal

Y. Xiang, Inference with Uncertain Knowledge 1

Transcription:

Probabilistic Graphical Models Redes Bayesianas: Definição e Propriedades Básicas Renato Martins Assunção DCC, UFMG - 2015 Renato Assunção, DCC, UFMG PGM 1 / 29

What s the use of? = BN Y = (Y 1, Y 2,..., Y k ) is a random vector. Composed by k random variables. If they are binaries, the sample space Ω has 2 k elements. To specify P(Y = y) means to assign values to the 2 k possible configurations for y. This is not practical even if k is moderate. BN simplifies this task by using only local influences. It also allows for probability calculations more efficiently than O(2 k ). Renato Assunção, DCC, UFMG PGM 2 / 29

Local influence Causal diagram is represented in the form of a graph. We calculate the probability of a random variable conditioned ONLY on its parents. Hence, we look only at the most immediate ancestors, who have a direct influence on it. No need to look at the grand-parents, grand-grand-parents,..., only the parents. This is enough to obtain the joint distribution of all variables in the graph. Renato Assunção, DCC, UFMG PGM 3 / 29

BN is a model for the JOINT probability distribution of a random vector Y = (Y 1, Y 2,..., Y k ). Y can have a very large dimension. Basic idea: to represent the causal ralationships among the variables in Y by means of a directed graph. Representation is only an approximation of the reality. Renato Assunção, DCC, UFMG PGM 4 / 29

An example Figura: Five variables: Friends?, Rain?, Game, Cloth Change? Afternoon Activity Renato Assunção, DCC, UFMG PGM 5 / 29

DAG Graph must be a DAG: acyclical and directed. DAG: directed acyclical graph Acyclical: no directed loops. That is, no cycle with edges pointing in the same direction along the way. Renato Assunção, DCC, UFMG PGM 6 / 29

Examples of DAGs Renato Assunção, DCC, UFMG PGM 7 / 29

Examples of NON DAGs Renato Assunção, DCC, UFMG PGM 8 / 29

Back to BNs Specify a DAG representing the causal relationship between the random variables. Each variable is represented by a node in the DAG. Specify a probability for each node conditioned on the values of the parent variables This is called a CPT: conditional probability table (or distribution) And, voilá, that s all, folks... Renato Assunção, DCC, UFMG PGM 9 / 29

Daphne Koller example (from coursera) Renato Assunção, DCC, UFMG PGM 10 / 29

DAG+ CPTs Renato Assunção, DCC, UFMG PGM 11 / 29

Describing CPTs Renato Assunção, DCC, UFMG PGM 12 / 29

Bayesian Network These two ingredients, the DAG graph the set of CPTs are mixed by the chain rule. The result is the BAYESIAN NETWORK. We obtain the JOINT distribution of ALL variables. Renato Assunção, DCC, UFMG PGM 13 / 29

Chain rule with two events A and B are events P(B A) = P(A B)/P(A) Hence, P(A B) = P(A) P(B A) This is always valid, for any pair of events. This rule can also be used for two random variables. Renato Assunção, DCC, UFMG PGM 14 / 29

Chain rule with two discrete random variables Case 1: X and Y are discrete random variables. Take any two possible values x and y for X and Y as, for example, x = 0 and y = 2. These define two events: A = [X = 0] B = [Y = 2] Applying the chain rule for these two events, we have Renato Assunção, DCC, UFMG PGM 15 / 29

Chain rule with two discrete random variables Applying the chain rule for these two events, we have P(X = 0, Y = 2) = P(A B) = P(A) P(B A) = P(X = 0) P(Y = 2 X = 0) This is valid for ANY choie of x and y and therefore P(Y = y, X = x) = P(X = x) P(Y = y X = x) for all values x and y in the support set of the distributions. Renato Assunção, DCC, UFMG PGM 16 / 29

Chain rule with two continuous random variables The result is also valid for continuous random variables. In this case, the joint probability density function factors as the product f XY (x, y) = f X (x) f Y X (y x) Dropping the sub-indexes, we can write more concisely f (x, y) = f (x) f (y x) Renato Assunção, DCC, UFMG PGM 17 / 29

Chain rule with 3 events 3 events: P(A B C) Take D = A B as a single event and apply the chain rule for the two events D and C: P(A}{{ B} C) = P(D) P(C D) D = P(A B) P(C A B) = P(A B) P(C A B) applying again = P(A) P(B A) P(C A B) Renato Assunção, DCC, UFMG PGM 18 / 29

Chain rule with 3 RANDOM VARIABLES 3 discrete random variables X, Y, Z A configuration x, y, z of possible values This defines 3 events: A = [X = x] B = [Y = y] C = [Z = z] Applying the chain rule to these events, we obtain the JOINT distribution P(X = x, Y = y, Z = z) = P(X = x) P(Y = y X = x) P(Z = z X = x, Y = y) It is easy to see that we can generalize for n events. Renato Assunção, DCC, UFMG PGM 19 / 29

Chain rule in general n events A 1, A 2,..., A n P(A 1... A n 1 A n) = P(B) P(A n B) } {{ } B = P(A 1... A n 1 ) P(A n A 1... A n 1 ) = P(A 1... A n 2 )P(A n 1 A 1... A n 2 ) P(A n A 1... A n 1 ) =... = P(A 1 ) P(A 2 A 1 ) P(A 3 A 1 A 2 )... P(A n A 1... A n 1 ) Renato Assunção, DCC, UFMG PGM 20 / 29

Chain rule in general P(A 1... A n 1 A n) = P(A 1 ) P(A 2 A 1 ) P(A 3 A 1 A 2 )... P(A n A 1... A n 1 ) Why is this rule important? Because it gives a simple way to calculate a probability involving many simultaneous events. Rather than calculating the probability of all events simultaneously, we break down the task into a sequence of probability calculations. First, we order the events (above, we used the natural order 1, 2,..., n) We calculate the event of each SINGLE event A i CONDITIONED on the previous events A 1, A 2,..., A i. As it happens, this is commonly easier than the simultaneous calculation. Renato Assunção, DCC, UFMG PGM 21 / 29

Chain rule for n random variables n discrete random variables X 1, X 2,..., X n A configuration x 1, x 2,..., x n of possible values for these r.v. s This defines n events: A 1 = [X 1 = x 1 ] A 2 = [X 2 = x 2 ]... A n = [X n = x n ] Applying the chain rule to these n events, we obtain the JOINT distribution P(X 1 = x 1, X 2 = x 2,..., X n = x n) = P(X 1 = x 1 ) P(X 2 = x 2 X 1 = x 1 )... P(X n = x n X 1 = x 1,..., X n 1 = x n 1 ) This is ALWAYS valid, for any set of n random variables. Renato Assunção, DCC, UFMG PGM 22 / 29

Chain rule and BN We ALWAYS have P(X 1 = x 1, X 2 = x 2,..., X n = x n) = P(X 1 = x 1 ) P(X 2 = x 2 X 1 = x 1 )... P(X n = x n X 1 = x 1,..., X n 1 = x n 1 ) Note that the last r.v.x n depends on ALL THE n 1 PREVIOUS r.v. The main idea of BN is to reduce drastically this dependence on the ancestors. The BN model ASSUMES THAT, in the chain rule factorization, one random variable depends ONLY ON ITS PARENT NODES. Older ancestors can be ignored if we condition on the parents: ignore great-parents, great-great-prantes etc. Renato Assunção, DCC, UFMG PGM 23 / 29

Chain rule and BN Renato Assunção, DCC, UFMG PGM 24 / 29

Chain rule and BN Renato Assunção, DCC, UFMG PGM 25 / 29

And where the probabilities come from? Figura: Probabilities come from the CPTs: conditional probability tables Renato Assunção, DCC, UFMG PGM 26 / 29

Calculating a probability P(D = d 0, I = i 1, G = g 3, S = s 1, L = l 1 ) = P(D = d 0 ) P(I = i 1 ) P(G = g 3 D = d 0, I = i 1 ) P(S = s 1 I = i 1 ) P(L = l 1 G = g 3 ) Renato Assunção, DCC, UFMG PGM 27 / 29

Definition of BN Renato Assunção, DCC, UFMG PGM 28 / 29

in R Use o arquivo bnlearn Intro.R para aprender os primeiros passos para definir uma BN em R. Você também vai precisar do arquivo survey.txt. Renato Assunção, DCC, UFMG PGM 29 / 29