Causal Bayesian Networks

Size: px

Start display at page:

Download "Causal Bayesian Networks"

Elinor Mosley
5 years ago
Views:

1 Causal Bayesian Networks () Ste7 (2) (3) Kss Fus3 Ste2 () Fiure : Simple Example While Bayesian networks should typically be viewed as acausal, it is possible to impose a causal interpretation on these models with additional care. What we et as a result is a probabilistic extension of the qualitative causal models. The extension is useful since the qualitative models discussed last time are a bit limited in their ability to quantify interactions, e.., that Ste7 may only have a certain probability of activatin Kss. The modification necessary for maintainin a causal interpretation is exactly analoous to the qualitative models. Suppose we intervene and set the value of one variable, say we knock out Kss. Then the mechanism throuh which Kss is activated by Ste7 can no loner be used for inference (it wasn t responsible for the value set in the intervention). Graphically, this just means deletin all the arrows to the variable(s) set in the intervention. More formally, the probability model associated with the raph in the above fiure factors accordin to P (x )P (x 2 x )P (x 3 x )P (x x 2, x 3 ) () If we now set set(x 2 = ), i.e., knock out Kss, then the probability model over the remainin variables is simply P (x )P (x 3 x )P (x x 2 =, x 3 ) (2) Note that the parameters in the conditional tables are exactly as before; the only difference is that one of the conditionals is missin.

2 2 The estimation of Bayesian network proceeds otherwise as before. For example, suppose we have two observed expression patterns, one arisin from a causal intervention (knock out), the other one not. x x 2 x 3 x D D 2 ( set(x 2 = ) ) To estimate the parameters we just write down the lo likelihood of the observed data while takin into account that the model may have to be modified slihtly when incorporatin causal measurements lo P (D) lo P (D2) = lo P (x = ) lop (x 2 = x = ) lo P (x 3 = x = ) lop (x = x 2 =, x 3 = ) lo P (x = 0) lop (x 3 = 0 x = 0) lop (x = 0 x 2 =, x 3 = 0) We try to maximize this lo likelihood relative to the parameters in the conditional probabilities (tables). The maximum likelihood settin of the parameters reduces to observed frequencies such as x ˆP (x ) 0 0 /2 /2 (3) Decision tree models Another issue with the Bayesian networks is that the reulatory proram represented by the network is not explicit in terms of the interactions involved but rather buried in the parameters. For example: A F

3 3 where A represents the acetylation level in the neihborhood of ene and F is a known reulator. We cannot infer from the raph how acetylation of the histone molecules miht interact with the transcription factor to reulate the downstream enes. A more explicit description miht look like: if A > 5 (enrichment) then if F>5 (enrichment) then hih transcription (Gaussian with a lare mean) otherwise moderate transcription otherwise no transcription This is a decision tree no A > 5 yes no F > 5 yes that aims to capture the distribution over transcript levels conditionally A and on F. Why is this more interpretable? Because we understand rules more easily than dependencies. A decision tree representation of the interaction between the parents (causes) and the ene is not always compact, however. For example, we need a lare decision tree to represet a simple linear interaction provided that the decisions (branchin) is based only on individual variables. How can we estimate decision tree models? First, we find the variable with the larest effect on transcription (e.., acetylation) and specify it as the root decision (as above).

4 Then we expand each branch similarly accordin to some estimation criterion such as ain in the likelihood of the predictions. Expandin the tree endlessly is not helpful; more leaves you have, the less data you have to justify that the branch should be expanded. In other words, you can simply run out of data by expandin the tree. Automated Experiment Desin (active learnin) Estimatin complex models from data typically requires more data than we have. One way to minimize the amount of data needed either to estimate a model or distinuish between competin models is by optimizin the selection of experiments to carry out. We consider here automatin the selection process. Automated experiment desin or active learnin can be used in many bioloical contexts, includin probe desin, which factor to profile, which ene to knockout, and so on. We will focus here for simplicity on ene deletions to discriminate amon qualitative causal models discussed earlier. A similar approach can be used with (causal) Bayesian networks as well. The models we consider are M 0 F G M F M 2 F M 3 F F M There are four key problems we have to solve. represent model uncertainty, i.e., P (M i ) 2. predict possible outcomes for each pair of model and experiment, P e.., ( M, set(f = )) 3. evaluate posterior probability over the models for each possible experiment and (hy pothetical) outcome, e.., compute P (M, set(f = )). define the (information) ain from carryin out each possible experiment (deletin or F in this case)

5 5 We will solve these problems in turn when the two possible experiments are ene deletions: set(f = ) or set( = ). Problem As before, it is simply to define a distribution over the possible few models. This is obviously somewhat harder when we have a lare number of possible models (need to decompose the probability into a probability of each model feature) P (M 0 ) = F G P (M ) = F P (M 2 ) = F P (M 3 ) = F P (M ) = F 8 8 Can you uess which deletion we should carry out to maximally reduce the space of possible models? Problem 2 We have already discussed how to evaluate these probabilities. For example: P ( = 0 set(f = )) = P (M 0 )P ( = 0 M 0, set(f = ))... P (M )P ( = 0 M, set(f = )) We et two tables correspondin to the two possible experiments: : 0 P ( set(f = )) : : 0 P (F set( = )) : 2

6 6 Problem 3 We have to evaluate P (M i set(f = ), ) = P (M i )P ( set(f = ), M i ) P ( set(f = )) () for all experiments (here set(f = )) and all outcomes (here =, 0, ) Problem We define the ain as the reduction of model uncertainty (information ain): uncertainty before uncertainty after as in Gain(set(F = )) = entropy(m) P ( set(f = )) entropy(m set(f = ), ) where {,0,} (5) entropy(m) = P (M i ) lo 2 P (M i ) (6) i=0 (this is the expected number of yes/no questions we would need to ask to uess the identity of a randomly selected model from P (M i )) and entropy(m set(f = ), ) = P (M i set(f = ), ) lo 2 P (M i set(f = ), ) i=0 (7) In our case we et Gain(set(F = )) =.3 bits Gain(set( = )) =.5 bits

7 7 Note: There are many other ways of definin the ain from carryin out a specific experiment. The definition should depend on what you want to et out of carryin out the experiment. For example, suppose we are only interested in findin the most likely model and do not care whether we can rank the remainin models. In this case we could define the ain as Gain(set(F = )) = {,0,} and we would et (by pluin in the numbers) P ( set(f = ))[max P (M j set(f = ), ) next best] j (8) Gain(set(F = )) = Gain(set( = )) = This criterion would also select set( = ). In eneral these don t have to end up selectin the same experiment.

Vision Modules and Cue Combination

Vision Modules and Cue Combination Cue Coupling This section describes models for coupling different visual cues. The ideas in this section are logical extensions of the ideas in the earlier sections. But we are now addressing more complex