One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays

Size: px

Start display at page:

Download "One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays"

Sherman Gardner
5 years ago
Views:

1 One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays Peter Tiňo School of Computer Science University of Birmingham, UK One-shot Learning of Poisson Distributions p.1/27

2 cdna array analysis Biologists - analyze patterns of expression levels of selected genes in different tissues possibly obtained under different conditions or treatment regimes. Measurement of gene expression levels: via hybridization to microarrays by counting gene tags (signatures) using e.g. Serial Analysis of Gene Expression (SAGE) or Massively Parallel Signature Sequencing (MPSS) methodologies One-shot Learning of Poisson Distributions p.2/27

3 SAGE SAGE procedure results in a library of short sequence tags, each representing an expressed gene. Key assumption: every mrna copy in the tissue has the same chance of ending up as a tag in the library. Selecting a specific tag from the pool of transcripts can be approximately considered as sampling with replacement Key step in many SAGE studies: identification of interesting genes typically those that are differentially expressed under different conditions/treatments. Compare the number of specific tags found in two SAGE libraries corresponding to different conditions or treatments. One-shot Learning of Poisson Distributions p.3/27

4 The approach of Audic and Claverie Audic and Claverie among the first to systematically study the influence of random fluctuations and sampling size on the reliability of digital expression profile data. a popular approach in current biological research: 427 citations (ISI Web of Knowledge), over 100 citations in the past 3 years. Typically, cdna libraries contain a large number of different expressed genes and observing a given cdna qualifies as a rare event. One-shot Learning of Poisson Distributions p.4/27

5 A-C approach Consider a transcript representing a small fraction of the library and a large number N of clones. The probability of observing x tags of the same gene is well-approximated by the Poisson distribution parametrized by λ 0: P(X = x λ) = e λλx x!. The unknown parameter λ signifies the number of transcripts of the given type (tag) per N clones in the cdna library. One-shot Learning of Poisson Distributions p.5/27

6 A-C approach - cont d Null hypothesis of not differentially expressed genes: the tag count x in one library comes from the same underlying Poisson distribution P( λ) as the tag count y in the other library. Each SAGE library represents a single (count) measurement only! From a purely statistical standpoint, resolving this issue is potentially quite problematic... Key instrument of the A-C approach: distribution P(y x) over tag counts y in one library informed by the tag count x in the other library, under the null hypothesis that the tag counts are generated from the same but unknown Poisson distribution. One-shot Learning of Poisson Distributions p.6/27

7 So what do we really want to do? press once press once λ 1 λ 2 x y Are you crazy?? λ = 1 λ 2 One-shot Learning of Poisson Distributions p.7/27

8 P(y x) P(y x) = = = p(y,λ x) dλ P(y λ,x) p(λ x) dλ P(y λ) P(x λ) p(λ) 0 P(x λ ) p(λ ) dλ dλ. Imposing flat prior p(λ) over the Poisson parameter λ results in P(y x) = 1 y! 0 e 2λ λ x+y dλ 0 e λ λ x dλ. One-shot Learning of Poisson Distributions p.8/27

9 A-C statistic Since Gamma distribution parametrized by a, b > 0 takes the form Gamma(λ a,b) = 1 Γ(a) ba λ a 1 e bλ, where Γ(a) = 0 u a 1 e u du is the Gamma function, we have P(y x) = 1 Γ(x + y + 1) y! 2 x+y+1, Γ(x + 1) which, since x and y are integers (i.e. Γ(x) = (x 1)!), can be rewritten as P(y x) = 1 (x + y)! 2 x+y+1 x! y! = 1 2 x+y+1 ( x + y x ). One-shot Learning of Poisson Distributions p.9/27

10 A-C statistic - cont d P(y x) is symmetric, i.e. for x,y 0, P(y x) = P(x y). a desirable property: since if the counts x,y are related to two libraries of the same size, they should be interchangeable when analyzing whether they come from the same underlying process or not. A-C statistic can be used e.g. for principled inferences, construction of confidence intervals, statistical testing etc. we ask: 1. How natural is the A-C statistic s representation of the underlying unknown Poisson distribution governing the tag counts? 2. Given that the observed tag count sample is very limited, how well can the Audic-Claverie approach work, i.e. how well does the A-C statistic capture the underlying Poisson distribution? One-shot Learning of Poisson Distributions p.10/27

11 Poisson distribution vs A-C statistic x=10 x= p(yix) Poisson(y x) 0.12 p(yix) Poisson(y x) y y Graphs of A-C statistic P(y x) (solid line) and the corresponding Poisson distribution P(y λ) at λ = x (dashed line) for x = 10 (a) and x = 30 (b). One-shot Learning of Poisson Distributions p.11/27

12 A-C statistic - mode structure The A-C statistic and the underlying Poisson distribution are quite similar in their nature. For any (integer) mean tag count λ 1, the Poisson distribution P( λ) has two neighboring modes located at λ and λ 1, with P(λ λ) = P(λ 1 λ). After observing a count x, the A-C statistic expects counts y = x and y = x 1 with the highest and equal probability. The other values of count y are less probable. One-shot Learning of Poisson Distributions p.12/27

13 A-C statistic - mode structure cont d Theorem 1 Let x,y and d be integers with ranges specified below. It holds: 1. P(x x) > P(x + d x) for any x 0 and d For x 1, P(x x) = P(x 1 x). 3. P(x x) > P(x d x) for any x 2 and 2 d x. One-shot Learning of Poisson Distributions p.13/27

14 A-C statistic - mean and variance Theorem 2 Consider a non-negative integer x and the associated A-C statistic P(y x). Then it holds: 1. E P(y x) [y] = x V ar P(y x) [y] = E P(y x) [(y E P(y x) [y]) 2 ] = 2 E P(y x) [y]. Note that for Poisson distribution: E P(y x) [y] = V ar P(y x) [y] = x. The larger variance of P( x) is the result of Bayesian averaging of Poisson distributions under a flat prior. One-shot Learning of Poisson Distributions p.14/27

15 A-C statistic - Information Theory Assume that there is some true underlying Poisson distribution P(y λ) over possible counts y 0 with unknown mean λ. In the same process, we first generate a count x and then use the A-C statistic P(y x) to define a distribution over y, given the already observed count x. We ask: How different, in terms of Kullback-Leibler (K-L) divergence, are the two distributions P(y x) and P(y λ) over y? For the A-C statistic to work, one would like P(y x) to be sufficiently representative of the true unknown distribution P(y λ). One-shot Learning of Poisson Distributions p.15/27

16 Thought experiment 1 The environment P(. λ) press once Let s see how close you can get λ What do you mean by "close"? x P(. x ) "similarity" between P(. λ) and P(. x) One-shot Learning of Poisson Distributions p.16/27

17 K-L divergence from P(y λ) to P(y x) D(λ,x) = D KL [P(y λ) P(y x)] = y=0 P(y λ) log P(y λ) P(y x). We have D(λ,x) = H[P(y λ)] + log x! + (λ + x + 1)log 2 + F(λ,0) F(λ,x), where for each integer d 0, F(λ,d) = E P(y λ) [log(y + d)!] = P(y λ) log(y + d)! y=0 and H[P(y λ)] is the entropy of P(y λ). One-shot Learning of Poisson Distributions p.17/27

18 Minimum of D(λ, x) One might intuitively expect D(λ,x) to be minimal at x = λ. The conditioning count in the A-C statistic would be the mean of the underlying Poisson distribution. However, the mode of that Poisson distribution, λ 1, is surrounded by enough probability mass to yield: Theorem 3 For any integer λ 1, it holds D(λ, λ) > D(λ, λ 1). In other words, D KL [P(y λ) P(y λ)] > D KL [P(y λ) P(y λ 1)]. One-shot Learning of Poisson Distributions p.18/27

19 Thought experiment 2 The environment P(. λ) press many times Repeat this many times... λ... and see how close I get on average x P(. x ) "similarity" between P(. λ) and P(. x) One-shot Learning of Poisson Distributions p.19/27

20 Expectation of D(λ, x) under sampling of x Given an underlying Poisson distribution P(x λ), if we repeatedly generated a representative count x from P(x λ), what would be the average divergence of the corresponding A-C statistic P(y x) from the truth P(y λ)? We are interested in the quantity E(λ) = E P(x λ) [D(λ,x)]. (1) Up to terms of order O(λ 1 ), the expected divergence of A-C statistic P(y x)] from the true underlying Poisson distribution P(y λ) is equal to (1/2)log 2. One-shot Learning of Poisson Distributions p.20/27

21 Expectation of D(λ, x) Theorem 4 Consider an underlying Poisson distribution P( λ) parametrized by some λ > 0. Then ) E(λ) = E P(x λ) [D KL [P(y λ) P(y x)]] = 1 2 log 2 + O ( 1 λ Sketch of proof: E(λ) = λ(log λ log e + 2log 2) + log 2 + F(λ,0) E P(x λ) [F(λ,x)] E P(x λ) [F(λ,x)] = F(2λ,0) F(λ,0) = λ(log λ log e) log(2πeλ) + O(λ 1 ) One-shot Learning of Poisson Distributions p.21/27

22 Higher order expansion of E(λ) Entropy expansion: H[P(y λ)] = 1 2 log 2(2πeλ) 1 12λ 1 24λ 2 + O(λ 3 ) Expected divergence measured in bits: E(λ) = λ ( 1 1 ) λ 2 ( ) + O(λ 3 ) One-shot Learning of Poisson Distributions p.22/27

23 Analytical approximation of E(λ) E[D KL] numer anal lambda One-shot Learning of Poisson Distributions p.23/27

24 Discussion The Audic-Claverie method a popular approach for detection of differentially expressed genes in the SAGE framework. Main assumption under the null hypothesis the tag counts x,y in two libraries come from the same but unknown Poisson distribution P( λ). The problem each SAGE library represents only a single measurement. Poisson distribution is rather rigid : it is unimodal and parametrized by a single parameter λ representing both its mean and variance. Learning about P( λ) from a very limited sample (as one is effectively bound to do in the SAGE framework) is much less suspicious than one might naively expect. One-shot Learning of Poisson Distributions p.24/27

25 Discussion - cont d We analyzed how close is the A-C statistic P( λ) (in terms of K-L divergence) to the underlying Poisson distribution P( λ) of tag counts. On average, the A-C statistic is never too far from the true underlying distribution. Up to terms of order O(λ 3 ), on average, the A-C statistic is never further away from the truth P( λ) than half-a-bit of additional information. Hence, the Audic-Claverie method can be expected to work well even though the SAGE libraries represent very sparse samples. One-shot Learning of Poisson Distributions p.25/27

26 Discussion - cont d So far the Audic-Claverie methodology has been verified only empirically through a series of specific Monte Carlo simulations. It has not been clear how general the apparently stable simulation findings were. The A-C statistic is universally applicable in any situation where inferences about the underlying Poisson distribution must be made based on an extremely sparse sample. In the Monte Carlo simulations the false alarm rate was small for genes associated with small tag counts and gradually increased for higher tag counts. These findings are consistent with our theoretically calculated divergence function E(λ). One-shot Learning of Poisson Distributions p.26/27

27 Thank you! Further reading, full proofs etc.: P. Tiňo: Basic Properties and Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays. BMC Bioinformatics, 10:310, Open access, or pxt/my.publ.html One-shot Learning of Poisson Distributions p.27/27

ISSN Article

ISSN Article Entropy 23, 5, 22-22; doi:.339/e5422 OPEN ACCESS entropy ISSN 99-43 www.mdpi.com/journal/entropy Article Pushing for the Extreme: Estimation of Poisson Distribution from Low Count Unreplicated Data How