Dobbiaco Lectures 2010 (2)Solved and Unsolved Problems in

Size: px

Start display at page:

Download "Dobbiaco Lectures 2010 (2)Solved and Unsolved Problems in"

Augustus McKinney
6 years ago
Views:

1 Dobbiaco Lectures 2010 (2) Solved and Unsolved Problems in Biology Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Dobbiaco

2 Outline

3 PART III : Uncertainty

4 Main theses Outline A thought is a proposition with sense. 3 A proposition is a truth-function of elementary propositions. 4 The general form of a proposition is the general form of a truth function, which is: p, ξ, ξ 5 Where (or of what) one cannot speak, one must pass over in silence. Ludwig Wittgenstein, Tractatus Logico-Philosophicus, 1921.

5 Outline

6 Random Variables A (discrete) random variable is a numerical quantity that in some experiment (involving randomness) takes a value from some (discrete) set of possible values. More formally, these are measurable maps X(ω),ω Ω, from a basic probability space (Ω, F, P) ( outcomes, a sigma field of subsets of Ω and probability measure P on F ). Events...{ω Ω X(ω) = x i }... same as {X = x i } [X assumes the value x i ].

7 Few Examples Example 1: Rolling of two six-sided dice. Random Variable might be the sum of the two numbers showing on the dice. The possible values of the random variable are 2, 3,..., 12. Example 2: Occurrence of a specific word GAATTC in a genome. Random Variable might be the number of occurrence of this word in a random genome of length The possible values of the random variable are 0, 1, 2,...,

8 The Probability Distribution The probability distribution of a discrete random variable Y is the set of values that this random variable can take, together with the set of associated probabilities. Probabilities are numbers in the range between zero and one (inclusive) that always add up to one when summed over all possible values of the random variable.

9 Bernoulli Trial Outline A Bernoulli trial is a single trial with two possible outcomes: success & failure. P(success) = p and P(failure) = 1 p q. Random variable S takes the value 1 if the trial results in failure and +1 if it results in success. P S (s) = p (1+s)/2 q (1 s)/2, s = 1,+1.

10 The Binomial Distribution A Binomial random variable is the number of successes in a fixed number n of independent Bernoulli trials (with success probability = p). Random variable Y denotes the total number of successes in the n trials. ( ) n P Y (y) = p y q n y, y = 0, 1,...,n. y

11 The Poisson Distribution A random variable Y has a Poisson distribution (with parameter λ > 0) if P Y (y) = e λ λ y, y = 0, 1,... y! The Poisson distribution often arises as a limiting form of the binomial distribution.

12 Continuous Random Variables We denote a continuous random variable by X and observed value of the random variable by x. Each random variable X with range I has an associated density function f X (x) which is defined, positive for all x and integrates to one over the range I. Prob(a < X < b) = b a f X (x)dx.

13 The Normal Distribution A random variable X has a normal or Gaussian distribution if it has range (, ) and density function f X (x) = 1 2πσ e (x µ)2 2σ 2, where µ and σ > 0 are parameters of the distribution.

14 Expectation Outline For a random variable Y, and any function g(y) of Y, the expected value of g(y) is E(g(Y)) = y g(y)p Y (y), when Y is discrete; and E(g(Y)) = y g(y)f Y (y) dy, when Y is continuous. Thus, mean(y) = E(Y) = µ(y), variance(y) = E(Y 2 ) E(Y) 2 = σ 2 (Y).

15 Conditional Probabilities Suppose that A 1 and A 2 are two events such that P(A 2 ) 0. Then the conditional probability that the event A 1 occurs, given that event A 2 occurs, denoted by P(A 1 A 2 ) is given by the formula P(A 1 A 2 ) = P(A 1 A 2 ). P(A 2 )

16 Bayes Rule Outline Suppose that A 1 and A 2 are two events such that P(A 1 ) 0 and P(A 2 ) 0. Then P(A 2 A 1 ) = P(A 2)P(A 1 A 2 ). P(A 1 )

17 Bayes Nets Outline Bayes Nets or Bayesian networks are graphical representation for probabilistic relationships among a set of random variables. Given a finite set X = {X 1,..., X n } of discrete random variables where each variable X i may take values from a finite set, denoted by Val(X i ). A Bayesian network is an annotated directed acyclic graph (DAG) G that encodes a joint probability distribution over X.

18 The graph G (Bayesian Network) is defined as follows: The nodes of the graph correspond to the random variables X 1, X 2,...,X n. The links of the graph correspond to the direct influence from one variable to the other. 1 If there is a directed link from variablex i to variable X j, variable X i will be a parent of variable X j. 2 Each node is annotated with a conditional probability distribution (CPD) that represents P(X i Pa(X i )), where Pa(X i ) denotes the parents of X i in G.

19 The pair (G, CPD) encodes the joint distribution P(X 1,..., X n ). A unique joint probability distribution over X from G is factorized as: P(X 1,..., X n ) = i (P(X i Pa(X i ))).

21 P(X 1 = no) = 0.8 P(X 1 = yes) = 0.2 P(X 2 = absent X 1 = no) = 0.95 P(X 2 = absent X 1 = yes) = 0.75 P(X 2 = present X 1 = no) = 0.05 P(X 2 = present X 1 = yes) = 0.25 P(X 3 = absent X 1 = no) = P(X 3 = absent X 1 = yes) = P(X 3 = absent X 1 = no) = P(X 3 = absent X 1 = yes) = P(X 4 = absent X 2 = absent, P(X 4 = absent X 2 = absent, X 3 = absent) = 0.95 X 3 = present) = 0.5 P(X 4 = absent X 2 = present, P(X 4 = absent X 2 = present, X 3 = absent) = 0.9 X 3 = present) = 0.25 P(X 4 = present X 2 = absent, P(X 4 = present X 2 = absent, X 3 = absent) = 0.05 X 3 = present) = 0.5 P(X 4 = present X 2 = present, P(X 4 = present X 2 = present, X 3 = absent) = 0.1 X 3 = present) = 0.75 P(X 5 = absent X 3 = absent) = 0.98 P(X 5 = absent X 3 = present) = 0.4 P(X 5 = present X 3 = absent) = 0.02 P(X 5 = present X 3 = present) = 0.6

22 Causal Bayesian networks A causal Bayesian network of a domain is similar as the normal Bayesian network the difference is in the explanation of the links in the Bayesian networks. In the normal Bayesian networks, the links between variables can be explained as correlation or association. In a causal Bayesian network, the links mean that the parent variables will causally influence the values of the child variables.

23 The causal influence in this thesis is defined based on manipulation criteria: Manipulation Criteria Suppose there are two variables A and B in the domain; If we can manipulate the variables in the domain, set the value of variable A as a 1 or a 2 and measure its effect on variable B, then the probability distribution of variable B will change under the conditions of the different values of variable A P(B do(a = a 1 )) P(B do(a = a 2 )).

24 Bayesian network structure learning The main task in Bayesian network structure learning: Find a structure of Bayesian network that describes the observed data the best. The problem is NP-complete. Many heuristics have been proposed to learn Bayesian network structure. There are two categories of approaches for Bayesian network structure learning: The score-and-search-based approach and The constraint-based approach.

25 The score-and-search-based approach The score-and-search-based approach: The methods in this category start from an initial structure (generated randomly or from domain knowledge) and move to the neighbors with the best score in the structure space determinately or stochastically until a local maximum of the selected criteria is reached. The greedy learning process can re-start several times with different initial structures to improve the result.

26 The constraint-based approach The constraint-based approach: The methods under this category start to test the statistical significance of the pairs of variables conditioning on other variables to induce conditional independence. The pairs of variables which pass some threshold are deemed as directly connected in the Bayesian networks. The complete Bayesian network structure is constructed from the induced conditional independence and dependence information.

27 How can Bayes Nets be Causal? The causal Markov condition is a relative of Reichenbach s thesis that conditioning on common causes will render joint effects independent of other another. One can then add the assumption of faithfulness or stability as well as to assume that all underlying systems of causal laws are deterministic. Similarly, using causal minimality (or sufficiency) assumption, one may try to justify the claim that Bayesian Nets Are All There Is to Causality... Bayes nets encode information about probabilistic independencies. Causality, if it has any connection with probability, would seem to be related to probabilistic dependence. (Cartwright, N.)

28 Probability Raising Causes produce their effects; they make them happen. So, in the right kind of population we can expect that there will be a higher frequency of the effect (E) when the cause (C) is present than when it is absent; and conversely for preventatives. What kind of populations are the right kinds? Populations in which the requisite causal process sometimes operates unimpeded and its doing so is not correlated with other processes that mask the increase in probability, such as the presence of a process preventing the effect or the absence of another positive process. Cartwright

29 PART IV:

30 Outline

32 The law of causality... is a relic of a bygone age, surviving, like the monarchy, only because it is erroneously supposed to do no harm... Bertrand Russell, On the Notion of Cause. Proceedings of the Aristotelian Society 13: 1-26, 1913.

33 and Correlation A fallacy, known as cum hoc ergo propter hoc (Latin for with this, therefore because of this ): Correlations do not imply causation. The Probability Raising condition + Temporal Priority Condition

34 Regularity Theories (David Hume) Causes are invariably followed by their effects: We may define a cause to be an object, followed by another, and where all the objects similar to the first, are followed by objects similar to the second. Attempts to analyze causation in terms of invariable patterns of succession are referred to as regularity theories of causation. There are a number of well-known difficulties with regularity theories, and these may be used to motivate probabilistic approaches to causation.

35 Imperfect Regularities The first difficulty is that most causes are not invariably followed by their effects. Penetrance: The presence of a disease allele does not always lead to a disease phenotype. Probabilistic theories of causation: simply requires that causes raise the probability of their effects; an effect may still occur in the absence of a cause or fail to occur in its presence. Thus smoking is a cause of lung cancer, not because all smokers develop lung cancer, but because smokers are more likely to develop lung cancer than non-smokers.

36 Imperfect Regularities: INUS condition John Stuart Mill and John Mackie offer more refined accounts of the regularities that underwrite causal relations. An INUS condition: for some effect is an insufficient but non-redundant part of an unnecessary but sufficient condition. Complexity: raises problems for the epistemology of causation.

37 INUS condition Suppose, for example, that a lit match causes a forest fire. The lighting of the match, by itself, is not sufficient; many matches are lit without ensuing forest fires. The lit match is, however, a part of some constellation of conditions that are jointly sufficient for the fire. Moreover, given that this set of conditions occurred, rather than some other set sufficient for fire, the lighting of the match was necessary: fires do not occur in such circumstances when lit matches are not present. Epistasis, and gene-environment interaction.

38 Asymmetry Outline is usually asymmetric. If A causes B, then, typically, B will not also cause A. This poses a problem for regularity theories, for it seems quite plausible that if smoking is an INUS condition for lung cancer, then lung cancer will be an INUS condition for smoking. One way of enforcing the asymmetry of causation is to stipulate that causes precede their effects in time.

39 Spurious Regularities Suppose that a cause is regularly followed by two effects. For instance, a particular allele A is pleiotropic... It causes a disease trait, but also transcription of another gene B. B may be mistakenly thought to be causing the disease. B is also an INUS condition for disease state. But it s not a cause. Whenever the barometric pressure drops below a certain level, two things happen: First, the height of the column of mercury in a barometer drops. Shortly afterwards, a storm occurs. Then, it may well also be the case that whenever the column of mercury drops, there will be a storm.

40 Causes raise the probability of their effects. This can be expressed formally using the apparatus of conditional probability. Let A, B, C,... represent factors that potentially stand in causal relations. Let P be a probability function... such that Pr(A) represents the empirical probability that factor A occurs or is instantiated. Recall: P(B A) represents the conditional probability of B, given A. Pr(A B) Pr(B A) =. Pr(A)

41 If Pr(A) is 0, then the ratio in the definition of conditional probability is undefined. (There are other ways of handling this formally). A raises the probability of B is that Pr(B A) > Pr(B A). PR Axiom PR: A causes B if and only ifpr(b A) > Pr(B A).

42 Problems Outline Probability-raising is symmetric: if Pr(B A) > P(B A), then Pr(A B) > P(A B). The causal relation, however, is typically asymmetric. Probability-raising has trouble with spurious correlations. If A and B are both caused by some third factor, C, then it may be that Pr(B A) > Pr(B A) even though A does not cause B. Those with yellow-stained fingers are more likely to suffer from lung cancer... smoking tends to produce both effects. Intuitively, the way to address this problem is to require that causes raise the probabilities of their effects ceteris paribus.

43 Spurious Correlations NSO Screening off: If Pr(B A C) = P(B C), then C is said to screen A off from B. Equivalently (A B) C... To avoid the problem of spurious correlations, add a no screening off (NSO) Factor A occurring at time t, is a cause of the later factor B if and only if: Pr(B A) > Pr(B A) There is no factor C, occurring earlier than or simultaneously with A, that screens A off from B.

44 Yule-Simpson Effect NSO does not suffice to resolve the problem of spurious correlations Suppose, for example, that smoking is highly correlated with exercise: those who smoke are much more likely to exercise as well. Smoking is a cause of heart disease, but suppose that exercise is an even stronger preventative of heart disease. Then it may be that smokers are, over all, less likely to suffer from heart disease than non-smokers. A smoking, C exercise, and B heart disease, Pr(B A) < Pr(B A). Note, however, that if we conditionalize on whether one exercises or not, this inequality is reversed: Pr(B A C) > Pr(B A C) Pr(B AB Mishra C) > Dobbiaco Pr(B A Lectures 2010 C). (2)Solved and Unsolved Problems in

45 Test Situations TS Causes must raise the probability of their effects in test situations: TS: A causes B if Pr(B A T) > Pr(B A T) test situation T. A test situation is a conjunction of factors, which are held fixed. This suggests that in evaluating the causal relevance of A for B, we need to hold fixed other causes of B, either positively or negatively.

46 PART V:

47 Outline

49 of Regulation How did a complex regulatory system like Lac operon evolve? Proximate vs. Ultimate answer... (Intelligent) Design Principles in Biology, at a systems level? Teleological Intent in Biology? Could evolution have anticipated lactose producing mammals, when it evolved E. coli? If glucose disappears in the future, will the E. coli system evolve to discard the complex regulation of diauxie?

50 Mechanisms Outline Cairns and Foster: Start with a population of E. coli lac frameshift mutants. If you starve it on medium including lactose, it appears to specifically accumulate Lac + revertants. Adaptive mutation?

51 Luria-Delbrück Jackpot 1 Causal Explanation: A stress-induced general mutagenesis mechanism in a subpopulation of starved cells causes this trait to emerge... The hypermutable state model. 2 Acausal Explanation: Stress has no direct effect on mutability but favors only growth of cells that amplify their leaky mutant lac region... The amplification mutagenesis model. Selection enhances reversion primarily by increasing the mutant lac copy number within each developing clone on the selection plate. The observed general mutagenesis is attributed to a side effect of growth with an amplificationñinduction of SOS by DNA fragments released from a tandem array of lac copies. 3 Which one is correct?

52 Acausal Explanation P[M(t + t) S(t)] = P[M(t + t) S(t)] = µ t + o( t 2 ).

53 Causal Explanation P[M(t + τ) S(t)] = ρ P[M(t + τ) S(t)] = µτ And P[M(t + t) S(t)] = µ t + o( t 2 ).

54 Statistics in the Causal System 1 Initial Population Size = N 0 ; Growth rate is β dn(t) = βn(t)dt dn(t)/dt = βdt N t = N 0 e βt. 2 R c t = (ρn 0 e βτ ) e β(t τ), t > τ. 3 Poisson Approximation E(R c t ) = ρn 0 e βt Var(R c t ) = ρn 0 e βt

55 Statistics in the Acausal System 1 Mutations occur randomly at a rate proportional to N t. Thus these mutations occur in accordance with a Poisson process whose intensity is ν(t) = µe βt. 2 The number of mutations occurring in the time interval [0, t) is M t, with expectation E(M t ) = t 0 ν(s)ds = µ β (Eβt 1).

56 1 Hence { 0, M(t) = 0; Rt a = M(t) i=1 N 0e β(t ti), o.w.. 2 Luria-Delbrück Distribution E(R a t ) = µtn 0e βt Var(R a t ) = µ β N 0e βt (e βt 1)

57 Jackpots and Fat Tails Var(Rt a) E(Rt a) = eβt 1 βt Var(Rc t ) E(Rt c) = 1.

59 History Outline Historically, Luria-Delbrück experiment involved pure bacterial cultures (starting with a single bacterium), attacked by a bacterial virus (bacteriophage). They noticed that the culture would clear after a few hours due to destruction of the sensitive cells by the virus. However, after further incubation for a few hours, or sometimes days, the culture would become turbid again (due to growth of a bacterial variant which is resistant to the action of the virus).

60 History Outline The observations reminded them of a casino, where a gambler would lose his bets almost always, but only very occasionally hitting a jackpot. They derived the distribution of surviving cultures (approximately) to distinguish between two hypotheses: Mutation Hypothesis vs. Acquired Hereditary Immunity Hypothesis.

61 PART VI: Time

62 Outline

63 Markov Models Suppose there are n states S 1, S 2,..., S n. And the probability of moving to a state S j from a state S i depends only on S i, but not the previous history. That is: Then by Bayes rule: P(s(t + 1) = S j s(t) = S i, s(t 1) = S i1,...) = P(s(t + 1) = S j s(t) = S i ). P(s(0) = S i0, s(1) = S i1,..., s(t 1) = S it 1, s(t) = S it ) = P(s(0) = S i0 )P(S i1 S i0 ) P(S it S it 1 ).

64 HMM: Defined with respect to an alphabet Σ States A set of (hidden) states Q, A Q Q matrix of state transition probabilities A = (a kl ), and A Q Σ matrix of emission probabilities E = (e k (σ)). Q is a set of states that emit symbols from the alphabet Σ. Dynamics is determined by a state-space trajectory determined by the state-transition probabilities.

65 A Path in the HMM Path Π = π 1 π 2 π n = a sequence of states Q in the hidden markov model, M. x Σ = sequence generated by the path Π determined by the model M: [ n ] P(x Π) = P(π 1 ) P(x i π i ) P(π i π i+1 ) i=1

66 A Path in the HMM Note that [ n ] P(x Π) = P(π 1 ) P(x i π i ) P(π i π i+1 ) P(x i π i ) = e πi (x i ) P(π i π i+1 ) = a πi,π i+1 i=1 Let π 0 and π n+1 be the initial ( begin ) and final ( end ) states, respectively P(x Π) = a π0,π 1 e π1 (x 1 )a π1,π 2 e π2 (x 2 ) e πn (x n )a πn,π n+1 i.e. n P(x Π) = a π0,π 1 e πi (x i )a πi,π i+1. i=1

67 Decoding Problem For a given sequence x, and a given path π, the model (Markovian) defines the probability P(x Π) In a casino scenario: the dealer knows Π and x, the player knows x but not Π. The path of x is hidden. Decoding Problem: Find an optimal path π for x such that P(x π) is maximized. π = arg max P(x π). π

68 Dynamic Programming Approach Principle of Optimality Optimal path for the (i + 1)-prefix of x x 1 x 2 x i+1 uses a path for an i-prefix of x that is optimal among the paths ending in an unknown state π i = k Q.

69 Dynamic Programming Approach Recurrence: s k (i) = the probability of the most probable path for the i-prefix ending in state k k Q 1 i n s k (i) = e k (x i ) max l Q s l(i 1)a lk.

70 Dynamic Programming i = 0, Base case 0 < i n, Inductive case s begin (0) = 1, s k (0) = 0, k begin. s l (i + 1) = e l (x i+1 ) max k Q [s k(i) a kl ] i = n + 1 P(x π ) = max k Q s k(n)a k,end.

71 Viterbi Algorithm Dynamic Programing with log-score function Space Complexity = O(n Q ). Time Complexity = O(n Q ). Additive formula: S l (i) = log s l (i). S l (i + 1) = log e l (x i+1 ) + max k Q [S k(i) + log a kl ].

72 [End of Lecture #2]

Bioinformatics: Biology X

Bioinformatics: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Where (or of what) one cannot speak, one must pass over in silence.