Probabilistic Graphical Models Redes Bayesianas: Definição e Propriedades Básicas

Probabilistic Graphical Models Redes Bayesianas: Definição e Propriedades Básicas Renato Martins Assunção DCC, UFMG - 2015 Renato Assunção, DCC, UFMG PGM 1 / 29

What s the use of? = BN Y = (Y 1, Y 2,..., Y k ) is a random vector. Composed by k random variables. If they are binaries, the sample space Ω has 2 k elements. To specify P(Y = y) means to assign values to the 2 k possible configurations for y. This is not practical even if k is moderate. BN simplifies this task by using only local influences. It also allows for probability calculations more efficiently than O(2 k ). Renato Assunção, DCC, UFMG PGM 2 / 29

Local influence Causal diagram is represented in the form of a graph. We calculate the probability of a random variable conditioned ONLY on its parents. Hence, we look only at the most immediate ancestors, who have a direct influence on it. No need to look at the grand-parents, grand-grand-parents,..., only the parents. This is enough to obtain the joint distribution of all variables in the graph. Renato Assunção, DCC, UFMG PGM 3 / 29

BN is a model for the JOINT probability distribution of a random vector Y = (Y 1, Y 2,..., Y k ). Y can have a very large dimension. Basic idea: to represent the causal ralationships among the variables in Y by means of a directed graph. Representation is only an approximation of the reality. Renato Assunção, DCC, UFMG PGM 4 / 29

An example Figura: Five variables: Friends?, Rain?, Game, Cloth Change? Afternoon Activity Renato Assunção, DCC, UFMG PGM 5 / 29

DAG Graph must be a DAG: acyclical and directed. DAG: directed acyclical graph Acyclical: no directed loops. That is, no cycle with edges pointing in the same direction along the way. Renato Assunção, DCC, UFMG PGM 6 / 29

Examples of DAGs Renato Assunção, DCC, UFMG PGM 7 / 29

Examples of NON DAGs Renato Assunção, DCC, UFMG PGM 8 / 29

Back to BNs Specify a DAG representing the causal relationship between the random variables. Each variable is represented by a node in the DAG. Specify a probability for each node conditioned on the values of the parent variables This is called a CPT: conditional probability table (or distribution) And, voilá, that s all, folks... Renato Assunção, DCC, UFMG PGM 9 / 29

Daphne Koller example (from coursera) Renato Assunção, DCC, UFMG PGM 10 / 29

DAG+ CPTs Renato Assunção, DCC, UFMG PGM 11 / 29

Describing CPTs Renato Assunção, DCC, UFMG PGM 12 / 29

Bayesian Network These two ingredients, the DAG graph the set of CPTs are mixed by the chain rule. The result is the BAYESIAN NETWORK. We obtain the JOINT distribution of ALL variables. Renato Assunção, DCC, UFMG PGM 13 / 29

Chain rule with two events A and B are events P(B A) = P(A B)/P(A) Hence, P(A B) = P(A) P(B A) This is always valid, for any pair of events. This rule can also be used for two random variables. Renato Assunção, DCC, UFMG PGM 14 / 29

Chain rule with two discrete random variables Case 1: X and Y are discrete random variables. Take any two possible values x and y for X and Y as, for example, x = 0 and y = 2. These define two events: A = [X = 0] B = [Y = 2] Applying the chain rule for these two events, we have Renato Assunção, DCC, UFMG PGM 15 / 29

Chain rule with two discrete random variables Applying the chain rule for these two events, we have P(X = 0, Y = 2) = P(A B) = P(A) P(B A) = P(X = 0) P(Y = 2 X = 0) This is valid for ANY choie of x and y and therefore P(Y = y, X = x) = P(X = x) P(Y = y X = x) for all values x and y in the support set of the distributions. Renato Assunção, DCC, UFMG PGM 16 / 29

Chain rule with two continuous random variables The result is also valid for continuous random variables. In this case, the joint probability density function factors as the product f XY (x, y) = f X (x) f Y X (y x) Dropping the sub-indexes, we can write more concisely f (x, y) = f (x) f (y x) Renato Assunção, DCC, UFMG PGM 17 / 29

Chain rule with 3 events 3 events: P(A B C) Take D = A B as a single event and apply the chain rule for the two events D and C: P(A}{{ B} C) = P(D) P(C D) D = P(A B) P(C A B) = P(A B) P(C A B) applying again = P(A) P(B A) P(C A B) Renato Assunção, DCC, UFMG PGM 18 / 29

Chain rule with 3 RANDOM VARIABLES 3 discrete random variables X, Y, Z A configuration x, y, z of possible values This defines 3 events: A = [X = x] B = [Y = y] C = [Z = z] Applying the chain rule to these events, we obtain the JOINT distribution P(X = x, Y = y, Z = z) = P(X = x) P(Y = y X = x) P(Z = z X = x, Y = y) It is easy to see that we can generalize for n events. Renato Assunção, DCC, UFMG PGM 19 / 29

Chain rule in general n events A 1, A 2,..., A n P(A 1... A n 1 A n) = P(B) P(A n B) } {{ } B = P(A 1... A n 1 ) P(A n A 1... A n 1 ) = P(A 1... A n 2 )P(A n 1 A 1... A n 2 ) P(A n A 1... A n 1 ) =... = P(A 1 ) P(A 2 A 1 ) P(A 3 A 1 A 2 )... P(A n A 1... A n 1 ) Renato Assunção, DCC, UFMG PGM 20 / 29

Chain rule in general P(A 1... A n 1 A n) = P(A 1 ) P(A 2 A 1 ) P(A 3 A 1 A 2 )... P(A n A 1... A n 1 ) Why is this rule important? Because it gives a simple way to calculate a probability involving many simultaneous events. Rather than calculating the probability of all events simultaneously, we break down the task into a sequence of probability calculations. First, we order the events (above, we used the natural order 1, 2,..., n) We calculate the event of each SINGLE event A i CONDITIONED on the previous events A 1, A 2,..., A i. As it happens, this is commonly easier than the simultaneous calculation. Renato Assunção, DCC, UFMG PGM 21 / 29

Chain rule for n random variables n discrete random variables X 1, X 2,..., X n A configuration x 1, x 2,..., x n of possible values for these r.v. s This defines n events: A 1 = [X 1 = x 1 ] A 2 = [X 2 = x 2 ]... A n = [X n = x n ] Applying the chain rule to these n events, we obtain the JOINT distribution P(X 1 = x 1, X 2 = x 2,..., X n = x n) = P(X 1 = x 1 ) P(X 2 = x 2 X 1 = x 1 )... P(X n = x n X 1 = x 1,..., X n 1 = x n 1 ) This is ALWAYS valid, for any set of n random variables. Renato Assunção, DCC, UFMG PGM 22 / 29

Chain rule and BN We ALWAYS have P(X 1 = x 1, X 2 = x 2,..., X n = x n) = P(X 1 = x 1 ) P(X 2 = x 2 X 1 = x 1 )... P(X n = x n X 1 = x 1,..., X n 1 = x n 1 ) Note that the last r.v.x n depends on ALL THE n 1 PREVIOUS r.v. The main idea of BN is to reduce drastically this dependence on the ancestors. The BN model ASSUMES THAT, in the chain rule factorization, one random variable depends ONLY ON ITS PARENT NODES. Older ancestors can be ignored if we condition on the parents: ignore great-parents, great-great-prantes etc. Renato Assunção, DCC, UFMG PGM 23 / 29

Chain rule and BN Renato Assunção, DCC, UFMG PGM 24 / 29

Chain rule and BN Renato Assunção, DCC, UFMG PGM 25 / 29

And where the probabilities come from? Figura: Probabilities come from the CPTs: conditional probability tables Renato Assunção, DCC, UFMG PGM 26 / 29

Calculating a probability P(D = d 0, I = i 1, G = g 3, S = s 1, L = l 1 ) = P(D = d 0 ) P(I = i 1 ) P(G = g 3 D = d 0, I = i 1 ) P(S = s 1 I = i 1 ) P(L = l 1 G = g 3 ) Renato Assunção, DCC, UFMG PGM 27 / 29

Definition of BN Renato Assunção, DCC, UFMG PGM 28 / 29

in R Use o arquivo bnlearn Intro.R para aprender os primeiros passos para definir uma BN em R. Você também vai precisar do arquivo survey.txt. Renato Assunção, DCC, UFMG PGM 29 / 29