Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863 Daniel, Edmundo, Rosa Terceiro trimestre de 2012 UFRJ - COPPE Programa de Engenharia de Sistemas e Computação

Bayesian Networks Definition A Bayesian network B is a Directed Acyclic Graph It represents a joint probability distribution over a set of random variables

Bayesian Networks Definition The network is defined by a pair B=(G,Θ), where G is the DAG whose nodes X 1,X 2,,X n represent the direct dependences between these variables If the variable represented by a node is observed, the node is an evidence node, otherwise it is a hidden node

Bayesian Networks Definition The graph G encodes independence assumptions, by which each variable X i is independent of its nondescendants given its parents in G. Θ is the set of parameters of the network. This set contains the parameters: θ xi /π i =P(x i /π i ) (for each realization x i of X i )

Bayesian Networks Definition Θ is the set of parameters of the network. This set contains the parameters: θ xi /π i =P(x i /π i ) (for each realization x i of X i ) B defines a unique joint probability distribution over V: n P(X 1, X 2,..., X n )= i=1 n P(X i /π i )= i=1 θ X i /π i

Conditional Independence Independence: X A X B p(x A, x B )= p(x A ) p(x B ) Conditional Independence: X A X B / X C p(x A, x B / x C )= p(x A / x C ) p(x B / x C ) p(x A / x B, x C )= p(x A / x C )

Inference via BN BN is used to compute marginal distributions of one or more query nodes P( X / E=e)= P(X, e) P(e) = w x P( X, e, w) P(x, e) E is the set of observed variables (the evidence nodes) X is the set of unobserved variables whose values we are interested in estimating (query nodes) W are the random variables that are neither query nor evidence nodes

BN Inference Example P(C=T, A=T ) P(C=T / A=T )= P( A=T ) P(C=T, A=T )= S,W, B [T, F ] What is the probability of uncomfortable chair given the observation that the person suffers from backache? P(C=T) P(S)P(W /C=T )P(B /S, C=T) P(A=T /B) The number of terms in the sum will grow exponentially P( A=T )= S,W, B,C [T, F ] with the number of hidden nodes P(C) P(S)P(W /C) P(B/ S,C)P( A=T /B)

BN Inference Exact inference is an NP-hard problem Some algorithms to restricted classes networks (ex: message passing) Approximate inference methods (ex: Markov Chain Monte Carlo)

BN Learning BN is unknown and we want to learn it from the data Given training data and prior information (e.g., expert knowledge, casual relationships), estimate the graph topology (network structure) and the parameters of the JPD in the BN.

BN x HMM S t (system state at time t) = a,b,c O t (system output at time t) = x,y,z

BN x Reliability Block Diagram

D-Separation Independence properties in probabilistic graphical models can be exploited to reduce the computation cost of the query process If we know that a set of nodes X is independent of a set of nodes Y given E, then in the presence of E, variables in X cannot influence the beliefs about Y The independence assumptions satisfied by a Bayesian network can be identified using a graphical test called d-separation Four general cases to analyze whether knowing an evidence (variable E) about a variable X can change the beliefs about a variable Y

D-Separation: First case Represents an indirect causal effect, where an ancestor X of Y could pass influence via E The interpretation is the past is independent of the future X can only influence Y in the presence of E if E is not observed Evidence E blocks influence of X over Y

D-Separation: First case example Lily Allen Gene Allen received from her mother (Lily) Gene Susan received from her mother (Allen) Susan Blood type of Susan If we know the gene Susan received from her mother, the gene passed from Susan's grandmother no longer influence her blood type

D-Separation: Second case Symmetrical case of the first case: we want to know whether evidence above a descendant may affect an indirect ancestor It is an indirect evidence effect Evidence E blocks influence of Y over X

D-Separation: Second case example Lily Allen Gene Allen received from her mother (Lily) Gene Susan received from her mother (Allen) Susan Blood type of Susan Once the gene Susan received from her mother Allen is known, the blood type of Susan is not able to affect the beliefs on the gene Allen received from her mother Lily

D-Separation: Third case The node E is a common cause to nodes X and Y Evidence E blocks influence of X over Y

D-Separation: Third case example age Shoe size Amount of gray hair Shoe size and amount of gray hair of a person are highly dependent Given age, they are independent, since age provides all the information that shoe size can provide to infer amount of gray hair

D-Separation: Fourth case Common effect trail Three previous cases: X can influence Y via E if and only if E is not observed This case: X can influence Y via E if and only if E is observed If the common effect variable is not observed, knowing about a parent variable cannot affect the expectation about other parents

D-Separation: Fourth case example Traveling on vacations Missed the bus Late for lunch If we don't know the evidence late for lunch, knowing about missed the bus cannot affect the expectation about traveling on vacations If we know late for lunch, it increases the probability of missed the bus

Block definition Definition 1 Block: Let X and Y be random variables in the graph of a Bayesian network. We say an undirected path between X and Y is blocked by a (set of ) variable E if E is in such a path and influence of X cannot reach Y and change the beliefs about it because of evidence (or lack of it) of E.

Active Trail definition Definition 2 Active trail: Given an undirected path X 1... X n in the graph component of a Bayesian network, there is an active trail from X i to X n given a subset of the observed variables E, if whenever we have a v-structure X i or one of its descendants are in E; X i 1 X i X i+1, then in all other cases no other node along the trail is in E.

V-structure Whenever we have a v-structure, then F or one of its descendants are in E Traveling on vacations Missed the bus F Late for lunch F' Missed the meeting

Active Trail Example Are A and H independent or there is an active trail between them? E v-structure G A B v-structure D F H C I Not observed : B, D, E, G Observed : C and (F or I)

Active Trail Example A and H are independent in the following case: E v-structure G A B v-structure D F H C I Observe at least one of : B, D, E, G Not observe: C or (F and I)

D-separation definition Denition 3 d-separation: X and Y are d-separated by a set of evidence variables E if there is no active trail between any node x X and y Y given E, i.e, every undirected path from X to Y is blocked. The d-separation test guarantees that X and Y are independent given E if every path between X and Y is blocked by E and therefore influence cannot flow from X through E to affect the beliefs about Y. X Y / E if and only if E d-separates X from Y in the graph G.