Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Similar documents
Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Rapid Introduction to Machine Learning/ Deep Learning

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Directed Graphical Models

Bayesian Networks BY: MOHAMAD ALSABBAGH

Probabilistic Graphical Models (I)

Directed Graphical Models or Bayesian Networks

Probabilistic Graphical Networks: Definitions and Basic Results

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS Lecture 3. More Bayesian Networks

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

2 : Directed GMs: Bayesian Networks

Machine Learning Lecture 14

Directed and Undirected Graphical Models

Review: Bayesian learning and inference

Bayes Nets: Independence

4 : Exact Inference: Variable Elimination

CS 5522: Artificial Intelligence II

2 : Directed GMs: Bayesian Networks

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Learning in Bayesian Networks

Bayesian Networks. Motivation

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Probabilistic Graphical Models

3 : Representation of Undirected GM

Bayesian Networks to design optimal experiments. Davide De March

Introduction to Artificial Intelligence. Unit # 11

Probabilistic Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Lecture 6: Graphical Models

Learning With Bayesian Networks. Markus Kalisch ETH Zürich

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Bayes Networks 6.872/HST.950

Graphical Models and Kernel Methods

Learning Bayesian Networks (part 1) Goals for the lecture

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

PROBABILISTIC REASONING SYSTEMS

Bayesian Networks Representation

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

CS 188: Artificial Intelligence. Bayes Nets

Lecture 17: May 29, 2002

Chapter 16. Structured Probabilistic Models for Deep Learning

Artificial Intelligence Bayes Nets: Independence

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

{ p if x = 1 1 p if x = 0

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Probabilistic Reasoning. (Mostly using Bayesian Networks)

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Markov properties for directed graphs

Machine Learning for Data Science (CS4786) Lecture 24

Lecture 8: Bayesian Networks

Causality in Econometrics (3)

Bayesian Networks. Introduction

Directed Graphical Models

Introduction to Probabilistic Graphical Models

BN Semantics 3 Now it s personal!

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

CS6220: DATA MINING TECHNIQUES

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

STA 4273H: Statistical Machine Learning

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

6.867 Machine learning, lecture 23 (Jaakkola)

Based on slides by Richard Zemel

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Statistical Approaches to Learning and Discovery

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Y. Xiang, Inference with Uncertain Knowledge 1

Need for Sampling in Machine Learning. Sargur Srihari

Introduction to Bayesian Learning

The Monte Carlo Method: Bayesian Networks

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks

Probabilistic Reasoning Systems

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Probabilistic Models. Models describe how (a portion of) the world works

Bayesian Belief Network

Soft Computing. Lecture Notes on Machine Learning. Matteo Matteucci.

Bayes Nets III: Inference

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Introduction to Probabilistic Graphical Models

Directed and Undirected Graphical Models

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

The Origin of Deep Learning. Lili Mou Jan, 2015

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Lecture 4 October 18th

Reasoning Under Uncertainty: Belief Network Inference

Inference in Bayesian Networks

Probabilistic Graphical Models: Representation and Inference

Bayesian networks. Chapter Chapter

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CS6220: DATA MINING TECHNIQUES

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Transcription:

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863 Daniel, Edmundo, Rosa Terceiro trimestre de 2012 UFRJ - COPPE Programa de Engenharia de Sistemas e Computação

Bayesian Networks Definition A Bayesian network B is a Directed Acyclic Graph It represents a joint probability distribution over a set of random variables

Bayesian Networks Definition The network is defined by a pair B=(G,Θ), where G is the DAG whose nodes X 1,X 2,,X n represent the direct dependences between these variables If the variable represented by a node is observed, the node is an evidence node, otherwise it is a hidden node

Bayesian Networks Definition The graph G encodes independence assumptions, by which each variable X i is independent of its nondescendants given its parents in G. Θ is the set of parameters of the network. This set contains the parameters: θ xi /π i =P(x i /π i ) (for each realization x i of X i )

Bayesian Networks Definition Θ is the set of parameters of the network. This set contains the parameters: θ xi /π i =P(x i /π i ) (for each realization x i of X i ) B defines a unique joint probability distribution over V: n P(X 1, X 2,..., X n )= i=1 n P(X i /π i )= i=1 θ X i /π i

Conditional Independence Independence: X A X B p(x A, x B )= p(x A ) p(x B ) Conditional Independence: X A X B / X C p(x A, x B / x C )= p(x A / x C ) p(x B / x C ) p(x A / x B, x C )= p(x A / x C )

Inference via BN BN is used to compute marginal distributions of one or more query nodes P( X / E=e)= P(X, e) P(e) = w x P( X, e, w) P(x, e) E is the set of observed variables (the evidence nodes) X is the set of unobserved variables whose values we are interested in estimating (query nodes) W are the random variables that are neither query nor evidence nodes

BN Inference Example P(C=T, A=T ) P(C=T / A=T )= P( A=T ) P(C=T, A=T )= S,W, B [T, F ] What is the probability of uncomfortable chair given the observation that the person suffers from backache? P(C=T) P(S)P(W /C=T )P(B /S, C=T) P(A=T /B) The number of terms in the sum will grow exponentially P( A=T )= S,W, B,C [T, F ] with the number of hidden nodes P(C) P(S)P(W /C) P(B/ S,C)P( A=T /B)

BN Inference Exact inference is an NP-hard problem Some algorithms to restricted classes networks (ex: message passing) Approximate inference methods (ex: Markov Chain Monte Carlo)

BN Learning BN is unknown and we want to learn it from the data Given training data and prior information (e.g., expert knowledge, casual relationships), estimate the graph topology (network structure) and the parameters of the JPD in the BN.

BN x HMM S t (system state at time t) = a,b,c O t (system output at time t) = x,y,z

BN x Reliability Block Diagram

D-Separation Independence properties in probabilistic graphical models can be exploited to reduce the computation cost of the query process If we know that a set of nodes X is independent of a set of nodes Y given E, then in the presence of E, variables in X cannot influence the beliefs about Y The independence assumptions satisfied by a Bayesian network can be identified using a graphical test called d-separation Four general cases to analyze whether knowing an evidence (variable E) about a variable X can change the beliefs about a variable Y

D-Separation: First case Represents an indirect causal effect, where an ancestor X of Y could pass influence via E The interpretation is the past is independent of the future X can only influence Y in the presence of E if E is not observed Evidence E blocks influence of X over Y

D-Separation: First case example Lily Allen Gene Allen received from her mother (Lily) Gene Susan received from her mother (Allen) Susan Blood type of Susan If we know the gene Susan received from her mother, the gene passed from Susan's grandmother no longer influence her blood type

D-Separation: Second case Symmetrical case of the first case: we want to know whether evidence above a descendant may affect an indirect ancestor It is an indirect evidence effect Evidence E blocks influence of Y over X

D-Separation: Second case example Lily Allen Gene Allen received from her mother (Lily) Gene Susan received from her mother (Allen) Susan Blood type of Susan Once the gene Susan received from her mother Allen is known, the blood type of Susan is not able to affect the beliefs on the gene Allen received from her mother Lily

D-Separation: Third case The node E is a common cause to nodes X and Y Evidence E blocks influence of X over Y

D-Separation: Third case example age Shoe size Amount of gray hair Shoe size and amount of gray hair of a person are highly dependent Given age, they are independent, since age provides all the information that shoe size can provide to infer amount of gray hair

D-Separation: Fourth case Common effect trail Three previous cases: X can influence Y via E if and only if E is not observed This case: X can influence Y via E if and only if E is observed If the common effect variable is not observed, knowing about a parent variable cannot affect the expectation about other parents

D-Separation: Fourth case example Traveling on vacations Missed the bus Late for lunch If we don't know the evidence late for lunch, knowing about missed the bus cannot affect the expectation about traveling on vacations If we know late for lunch, it increases the probability of missed the bus

Block definition Definition 1 Block: Let X and Y be random variables in the graph of a Bayesian network. We say an undirected path between X and Y is blocked by a (set of ) variable E if E is in such a path and influence of X cannot reach Y and change the beliefs about it because of evidence (or lack of it) of E.

Active Trail definition Definition 2 Active trail: Given an undirected path X 1... X n in the graph component of a Bayesian network, there is an active trail from X i to X n given a subset of the observed variables E, if whenever we have a v-structure X i or one of its descendants are in E; X i 1 X i X i+1, then in all other cases no other node along the trail is in E.

V-structure Whenever we have a v-structure, then F or one of its descendants are in E Traveling on vacations Missed the bus F Late for lunch F' Missed the meeting

Active Trail Example Are A and H independent or there is an active trail between them? E v-structure G A B v-structure D F H C I Not observed : B, D, E, G Observed : C and (F or I)

Active Trail Example A and H are independent in the following case: E v-structure G A B v-structure D F H C I Observe at least one of : B, D, E, G Not observe: C or (F and I)

D-separation definition Denition 3 d-separation: X and Y are d-separated by a set of evidence variables E if there is no active trail between any node x X and y Y given E, i.e, every undirected path from X to Y is blocked. The d-separation test guarantees that X and Y are independent given E if every path between X and Y is blocked by E and therefore influence cannot flow from X through E to affect the beliefs about Y. X Y / E if and only if E d-separates X from Y in the graph G.