Closed-form sampling formulas for the coalescent with recombination
|
|
- Paula Warner
- 5 years ago
- Views:
Transcription
1 0 / 21 Closed-form sampling formulas for the coalescent with recombination Yun S. Song CS Division and Department of Statistics University of California, Berkeley September 7, 2009 Joint work with Paul Jenkins
2 0 / 21 Outline 1 Introduction Motivation 2 Asymptotic Sampling Formula Closed-form one-locus sampling formulas Two-locus sample configuration Universality Details Classifying the maximum likelihood estimate (MLE) 3 Accuracy Results Detailed examples Distribution of errors 4 Conclusion Summary Further work
3 1 / 21 Motivation Sample 1 = AACTAGG...CCGTGACC...ACAGCTAT Sample 2 = AACTAGG...CCGTAACC...ACAGCTAT Sample 3 = AACTGGG...CCGTGACC...ACAGCTAT Sample 4 = AACTGGG...CCGTAACC...ACAGTTAT Sample 5 = AACTAGG...CCGTGACC...ACAGTTAT The basic problem What is the probability of observing a sample of DNA sequences for a given population genetics model? This is behind many problems in population genetics, e.g. Estimating parameters: L(θ, ρ) = P(D θ, ρ) Ancestral inference Detecting departures from neutrality
4 1 / 21 Motivation Sample 1 = AACTAGG...CCGTGACC...ACAGCTAT Sample 2 = AACTAGG...CCGTAACC...ACAGCTAT Sample 3 = AACTGGG...CCGTGACC...ACAGCTAT Sample 4 = AACTGGG...CCGTAACC...ACAGTTAT Sample 5 = AACTAGG...CCGTGACC...ACAGTTAT The basic problem What is the probability of observing a sample of DNA sequences for a given population genetics model? But, except in the one-locus case with a special model of mutation, closed-form sampling formulas are generally unknown. When recombination is involved, obtaining an analytic formula for the sampling distribution has so far remained an intractable problem.
5 2 / 21 Motivation Monte Carlo Approaches Importance Sampling MCMC Rejection algorithms
6 2 / 21 Motivation Monte Carlo Approaches Importance Sampling MCMC Rejection algorithms The coalescent with recombination
7 Motivation Monte Carlo Approaches Importance Sampling MCMC Rejection algorithms The coalescent with recombination For large recombination rates, the genealogies sampled by Monte Carlo methods are typically very complicated, containing many recombination events. 2 / 21
8 Motivation Monte Carlo Approaches Importance Sampling MCMC Rejection algorithms The coalescent with recombination For large recombination rates, the genealogies sampled by Monte Carlo methods are typically very complicated, containing many recombination events. However, we in fact expect the dynamics to be easier to study for large recombination rates, since the loci under consideration would then be less dependent. 2 / 21
9 Motivation Monte Carlo Approaches Importance Sampling MCMC Rejection algorithms The coalescent with recombination For large recombination rates, the genealogies sampled by Monte Carlo methods are typically very complicated, containing many recombination events. However, we in fact expect the dynamics to be easier to study for large recombination rates, since the loci under consideration would then be less dependent. It seems plausible that there exists a stochastic process simpler than the standard coalescent with recombination, that describes the dynamics of the relevant degrees of freedom for large recombination rates. 2 / 21
10 3 / 21 Motivation Closed-form sampling formula for large recombination rates?
11 Motivation Closed-form sampling formula for large recombination rates? A two-locus model The loci are labeled A and B. ρ, the population-scaled recombination rate 3 / 21
12 Motivation Closed-form sampling formula for large recombination rates? A two-locus model The loci are labeled A and B. ρ, the population-scaled recombination rate n, sample configuration (defined later) q(n), the sampling distribution of n 3 / 21
13 Motivation Closed-form sampling formula for large recombination rates? A two-locus model The loci are labeled A and B. ρ, the population-scaled recombination rate n, sample configuration (defined later) q(n), the sampling distribution of n Asymptotic Series As ρ, find q(n ρ) = q 0 (n)+ q 1(n) + q ( ) 2(n) 1 ρ ρ 2 +O ρ 3, where q 0 (n), q 1 (n), q 2 (n),... are independent of ρ. 3 / 21
14 Motivation Closed-form sampling formula for large recombination rates? A two-locus model The loci are labeled A and B. ρ, the population-scaled recombination rate n, sample configuration (defined later) q(n), the sampling distribution of n θ A, θ B, the population-scaled mutation rates P A = (Pij A), P B = (Pij A ), transition matrices for the mutation process in the case of a finite-alleles model Asymptotic Series As ρ, find q(n ρ) = q 0 (n)+ q 1(n) + q ( ) 2(n) 1 ρ ρ 2 +O ρ 3, where q 0 (n), q 1 (n), q 2 (n),... are independent of ρ. 3 / 21
15 3 / 21 Outline 1 Introduction Motivation 2 Asymptotic Sampling Formula Closed-form one-locus sampling formulas Two-locus sample configuration Universality Details Classifying the maximum likelihood estimate (MLE) 3 Accuracy Results Detailed examples Distribution of errors 4 Conclusion Summary Further work
16 Closed-form one-locus sampling formulas n = (n 1,..., n K ), where n i denotes the number of gametes with allele i at the locus. q(n), probability of an ordered sample with configuration n. (x) n := x(x + 1) (x + n 1) Infinite-alleles model Ewens sampling formula (1972): q ESF (n) = θk (θ) n K (n i 1)! i=1 Finite-alleles Parent-Independent Mutation (PIM) model Mutation transition matrix satisfies P ij = P j. Wright s sampling formula (1949): q WSF (n) = 1 (θ) n K (θp i ) ni Any diallelic recurrent mutation model can be transformed into a PIM model. i=1 4 / 21
17 Two-locus sample configuration Two-locus sample configuration n = (a, b, c) a = (a i ), K -dim vector b = (b j ), L-dim vector c = (c ij ), K -by-l matrix A sample of 10 gametes c = (c ij ), a K -by-l matrix j {1,..., L} i {1,..., K } Sample size is c = c = / 21
18 6 / 21 Two-locus sample configuration Time Coalescence Mutation Recombination a = (a i ) i {1,...,K }, multiplicity of left half-fragments. b = (b j ) j {1,...,L}, multiplicity of right half-fragments. The complete sampling distribution is then q(a, b, c).
19 7 / 21 Two-locus sample configuration q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3
20 7 / 21 Two-locus sample configuration q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 q 0 (a, b, c) q 0 (a, b, c) is the exact sampling distribution when the two loci are unlinked (ρ = ). c
21 7 / 21 Two-locus sample configuration q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 q 0 (a, b, c) q 0 (a, b, c) is the exact sampling distribution when the two loci are unlinked (ρ = ). c A = (c i ) c B = (c j )
22 7 / 21 Two-locus sample configuration q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 q 0 (a, b, c) q 0 (a, b, c) is the exact sampling distribution when the two loci are unlinked (ρ = ). c A = (c i ) c B = (c j ) add to a to get add to b to get q A (a + c A ) q B (b + c B ).
23 7 / 21 Two-locus sample configuration q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 q 0 (a, b, c) q 0 (a, b, c) is the exact sampling distribution when the two loci are unlinked (ρ = ). Infinite alleles: q 0 (a, b, c) = q A ESF(a + c A )q B ESF(b + c B )
24 7 / 21 Two-locus sample configuration q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 q 0 (a, b, c) q 0 (a, b, c) is the exact sampling distribution when the two loci are unlinked (ρ = ). Infinite alleles: q 0 (a, b, c) = q A ESF(a + c A )q B ESF(b + c B ) Finite-alleles PIM: q 0 (a, b, c) = q A WSF(a + c A )q B WSF(b + c B )
25 7 / 21 Two-locus sample configuration q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 q 0 (a, b, c) q 0 (a, b, c) is the exact sampling distribution when the two loci are unlinked (ρ = ). Infinite alleles: q 0 (a, b, c) = q A ESF(a + c A )q B ESF(b + c B ) Finite-alleles PIM: q 0 (a, b, c) = q A WSF(a + c A )q B WSF(b + c B ) In fact, for any model of mutation, q 0 (a, b, c) = q A (a + c A )q B (b + c B ) where q A, q B are (possibly unknown) one-locus sampling distributions.
26 8 / 21 Universality q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 Universality For all mutation models, the functional form of q 0 (a, b, c) is q 0 (a, b, c) = F (q A, q B ; a, b, c) := q A (a + c A )q B (b + c B )
27 8 / 21 Universality q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 Universality For all mutation models, the functional form of q 0 (a, b, c) is q 0 (a, b, c) = F (q A, q B ; a, b, c) := q A (a + c A )q B (b + c B ) Theorem 1 q 1 (n) also exhibits universality: q 1 (a, b, c) = G(q A, q B ; a, b, c)
28 Universality q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ Universality + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 For all mutation models, the functional form of q 0 (a, b, c) is q 0 (a, b, c) = F (q A, q B ; a, b, c) := q A (a + c A )q B (b + c B ) Theorem 1 q 1 (n) also exhibits universality: q 1 (a, b, c) = G(q A, q B ; a, b, c) q 1 (a, b, c) = ( c 2) q B (b + c B ) K q A (a + c A ) L + K L ( cij i=1 j=1 2 q A (a + c A )q B (b + c B ) ( ci i=1 ( 2 c j j=1 2 ) ) q A (a + c A e i ) ) q B (b + c B e j ) q A (a + c A e i )q B (b + c B e j ), where e i is a unit vector with a 1 at entry i and 0 otherwise. 8 / 21
29 9 / 21 Details q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 Derivation of q 1 (a, b, c) 1 The full sampling distribution q(a, b, c) satisfies a recursion relation.
30 Details q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ Derivation of q 1 (a, b, c) + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 1 The full sampling distribution q(a, b, c) satisfies a recursion relation. Infinite-alleles case: [n(n 1) + θ A (a + c) + θ B (b + c) + ρc]q(a, b, c) = P K i=1 a i (a i 1 + 2c i )q(a e i, b, c) + P L j=1 b j (b j 1 + 2c j )q(a, b e j, c) + P K P L i=1 j=1 ˆcij (c ij 1)q(a, b, c e ij ) + 2a i b j q(a e i, b e j, c + e ij ) P h +θ K PL i A i=1 j=1 δ a i +c i,1 δ cij,1 q(a, b + e j, c e ij ) + δ ai,1δ ci,0 q(a e i, b, c) P h +θ L PK i B j=1 i=1 δ b j +c j,1 δ cij,1 q(a + e i, b, c e ij ) + δ bj,1δ c j,0 q(a, b e j, c) +ρ P K i=1 P L j=1 c ij q(a + e i, b + e j, c e ij ), with boundary conditions are q(e i, 0, 0) = q(0, e j, 0) = 1 for all i {1,..., K } and j {1,..., L}. 9 / 21
31 9 / 21 Details q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ Derivation of q 1 (a, b, c) + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 1 The full sampling distribution q(a, b, c) satisfies a recursion relation. 2 Using that recursion, one can show that q 1 (a, b, c) satisfies K L c ij q 1 (a, b, c) = f 1 (a, b, c)+ c q 1(a +e i, b +e j, c e ij ), i=1 j=1 where f 1 (a, b, c) is a function of the zeroth-order term q 0.
32 Details q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ Derivation of q 1 (a, b, c) + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 1 The full sampling distribution q(a, b, c) satisfies a recursion relation. 2 Using that recursion, one can show that q 1 (a, b, c) satisfies K L c ij q 1 (a, b, c) = f 1 (a, b, c)+ c q 1(a +e i, b +e j, c e ij ), i=1 j=1 where f 1 (a, b, c) is a function of the zeroth-order term q 0. 3 This equation admits a probabilistic interpretation: c q 1 (a, b, c) = q 1 (a + c A, b + c B, 0) + E[f 1 (A (m), B (m), C (m) )], m=1 where C (m) = (C (m) ij ) is a multivariate hypergeometric(c, c, m) random variable, and A (m), B (m) depend on C (m). 9 / 21
33 10 / 21 Details q 1 (a, b, c) = q 1 (a + c A, b + c B, 0) + c E[f 1 (A (m), B (m), C (m) )] m=1 Random matrix C (m) = (C (m) ij ) corresponds to a random subsample obtained by sampling without replacement m gametes from c A (m) = a + c A C (m) A C (m) A = (C (m) i ) and C (m) B P (i,j) [ C (m) ij and B (m) = b + c B C (m) B = c (m) ij = (C (m) j ) ] = ( 1 c ) m (i,j) ( cij c (m) ij )., where
34 10 / 21 Details q 1 (a, b, c) = q 1 (a + c A, b + c B, 0) + c E[f 1 (A (m), B (m), C (m) )] m=1 Random matrix C (m) = (C (m) ij ) corresponds to a random subsample obtained by sampling without replacement m gametes from c A (m) = a + c A C (m) A C (m) A = (C (m) i ) and C (m) B and B (m) = b + c B C (m) B = (C (m) j ), where f 1 is a deg-2 polynomial in the entries C ij of C (m), so the above expectation can be easily computed.
35 Details q 1 (a, b, c) = q 1 (a + c A, b + c B, 0) + c E[f 1 (A (m), B (m), C (m) )] m=1 Random matrix C (m) = (C (m) ij ) corresponds to a random subsample obtained by sampling without replacement m gametes from c A (m) = a + c A C (m) A C (m) A = (C (m) i ) and C (m) B and B (m) = b + c B C (m) B = (C (m) j ), where f 1 is a deg-2 polynomial in the entries C ij of C (m), so the above expectation can be easily computed. Lemma For all a and b, q 1 (a, b, 0) = 0. That q 1 (a, b, 0) vanishes is not expected a priori. 10 / 21
36 Details q 1 (a, b, c) = q 1 (a + c A, b + c B, 0) + Theorem 1 c E[f 1 (A (m), B (m), C (m) )], m=1 For either an infinite-alleles or an arbitrary finite-alleles model, ( ) c q 1 (a, b, c) = q A (a + c A )q B (b + c B ) 2 K ( ) q B (b + c B ) ci q A (a + c A e i ) q A (a + c A ) + K L i=1 j=1 ( cij 2 i=1 L j=1 2 ( c j 2 ) q B (b + c B e j ) ) q A (a + c A e i )q B (b + c B e j ), where e i is a unit vector with a 1 at entry i and 0 otherwise. 11 / 21
37 12 / 21 Details q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ Theorem 2 + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 For either an infinite-alleles or an arbitrary finite-alleles model, c q 2 (a, b, c) = q 2 (a + c A, b + c B, 0) + E[f 2 (A (m), B (m), C (m) )], m=1 where f 2 is a deg-4 polynomial in the entries C ij of C (m).
38 12 / 21 Details q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ Theorem 2 + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 For either an infinite-alleles or an arbitrary finite-alleles model, c q 2 (a, b, c) = q 2 (a + c A, b + c B, 0) + E[f 2 (A (m), B (m), C (m) )], m=1 where f 2 is a deg-4 polynomial in the entries C ij of C (m). Remarks Found a closed-form formula for c m=1 E[f 2(A (m), B (m), C (m) )].
39 12 / 21 Details q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ Theorem 2 + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 For either an infinite-alleles or an arbitrary finite-alleles model, c q 2 (a, b, c) = q 2 (a + c A, b + c B, 0) + E[f 2 (A (m), B (m), C (m) )], m=1 where f 2 is a deg-4 polynomial in the entries C ij of C (m). Remarks Found a closed-form formula for c m=1 E[f 2(A (m), B (m), C (m) )]. Unfortunately, q 2 (a, b, 0) 0 in general and we don t have a closed-form expression for it.
40 12 / 21 Details q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ Theorem 2 + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 For either an infinite-alleles or an arbitrary finite-alleles model, c q 2 (a, b, c) = q 2 (a + c A, b + c B, 0) + E[f 2 (A (m), B (m), C (m) )], m=1 where f 2 is a deg-4 polynomial in the entries C ij of C (m). Remarks Found a closed-form formula for c m=1 E[f 2(A (m), B (m), C (m) )]. Unfortunately, q 2 (a, b, 0) 0 in general and we don t have a closed-form expression for it. However, it satisfies a simple recursion relation that can be easily solved numerically using dynamic programming.
41 Details q(a, b, c ρ) = q 0 (a, b, c) + q 1(a, b, c) ρ Theorem 2 + q 2(a, b, c) ρ 2 ( ) 1 + O ρ 3 For either an infinite-alleles or an arbitrary finite-alleles model, c q 2 (a, b, c) = q 2 (a + c A, b + c B, 0) + E[f 2 (A (m), B (m), C (m) )], m=1 where f 2 is a deg-4 polynomial in the entries C ij of C (m). Remarks Found a closed-form formula for c m=1 E[f 2(A (m), B (m), C (m) )]. Unfortunately, q 2 (a, b, 0) 0 in general and we don t have a closed-form expression for it. However, it satisfies a simple recursion relation that can be easily solved numerically using dynamic programming. The contribution of q 2 (a, b, 0) 0 is negligibly small compared to that of the other terms. 12 / 21
42 13 / 21 Classifying the maximum likelihood estimate (MLE) Application: Classification of the MLE of ρ Theorem 3 (A sufficient condition for finite MLE) For a given sample configuration (a, b, c), if q 1 (a, b, c) > 0, then the MLE of ρ is finite.
43 Classifying the maximum likelihood estimate (MLE) Application: Classification of the MLE of ρ Theorem 3 (A sufficient condition for finite MLE) For a given sample configuration (a, b, c), if q 1 (a, b, c) > 0, then the MLE of ρ is finite. Is the converse true? If q 1 (a, b, c) < 0, then is MLE infinite? 13 / 21
44 Classifying the maximum likelihood estimate (MLE) Application: Classification of the MLE of ρ Theorem 3 (A sufficient condition for finite MLE) For a given sample configuration (a, b, c), if q 1 (a, b, c) > 0, then the MLE of ρ is finite. Counter-example to the converse log likelihood ! Finite alleles PIM model, θ = a = b = 0, c = ( ). q 1 (a, b, c) < 0. q q q(n) 13 / 21
45 Classifying the maximum likelihood estimate (MLE) Application: Classification of the MLE of ρ Theorem 3 (A sufficient condition for finite MLE) For a given sample configuration (a, b, c), if q 1 (a, b, c) > 0, then the MLE of ρ is finite. Counter-example to the converse log likelihood ! Finite alleles PIM model, θ = a = b = 0, c = ( ). q 1 (a, b, c) < 0. q q q(n) 13 / 21
46 13 / 21 Classifying the maximum likelihood estimate (MLE) Application: Classification of the MLE of ρ Theorem 3 (A sufficient condition for finite MLE) For a given sample configuration (a, b, c), if q 1 (a, Example b, c) > 0, then the MLE of ρ is finite Counter-example to the converse log likelihood ! log likelihood Finite alleles PIM model, θ = a = b = 0, c = ( ) ! q 1 (a, b, c) < 0. q q q(n)
47 13 / 21 Outline 1 Introduction Motivation 2 Asymptotic Sampling Formula Closed-form one-locus sampling formulas Two-locus sample configuration Universality Details Classifying the maximum likelihood estimate (MLE) 3 Accuracy Results Detailed examples Distribution of errors 4 Conclusion Summary Further work
48 14 / 21 Detailed examples Truncate at order 2 q ASF (a, b, c) = q 0 (a, b, c) + q 1(a, b, c) ρ + q 2(a, b, c) ρ 2 should approximate q(a, b, c) accurately for sufficiently large ρ.
49 14 / 21 Detailed examples Truncate at order 2 q ASF (a, b, c) = q 0 (a, b, c) + q 1(a, b, c) ρ + q 2(a, b, c) ρ 2 should approximate q(a, b, c) accurately for sufficiently large ρ. Example 1 Likelihood (10 15 ) ρ a = b = 0, c = ( Finite alleles PIM model, θ = 0.01 per locus. q q q q q(n) q 0 (n) q 0 (n) + q 1(n) ρ q0 (n) + q 1(n) ρ + q 2(n) ρ 2 )
50 14 / 21 Detailed examples Truncate at order 2 q ASF (a, b, c) = q 0 (a, b, c) + q 1(a, b, c) ρ + q 2(a, b, c) ρ 2 should approximate q(a, b, c) accurately for sufficiently large ρ. Example 2 Likeihood (10 16 ) ! a = b = 0, c = ( Finite alleles PIM model, θ = 0.01 per locus. q q q q q(n) q 0 (n) q 0 (n) + q 1(n) ρ q0 (n) + q 1(n) ρ + q 2(n) ρ 2 )
51 Distribution of errors A measure for accuracy: unsigned relative error q ASF (n) q(n) q(n) 100% q q q q 0 (n) q 0 (n) + q 1(n) ρ q0 (n) + q 1(n) ρ + q 2(n) ρ 2 Accuracy across all dimorphic (0, 0, c) samples of size 20, when ρ = 50 in the symmetric PIM model Density Density Unsigned relative error (%) θ = Unsigned relative error (%) θ = 1 15 / 21
52 Distribution of errors Accuracy: effect of recombination rate Accuracy of q 0 (0, 0, c) + q 1(0,0,c) ρ Accuracy across all dimorphic samples. + q 2(0,0,c) ρ 2 : Fix c = 20, vary ρ Density Unsigned relative error (%) θ = 0.01 Density ρ = ρ = 50 ρ = 100 ρ = Unsigned relative error (%) θ = 1 16 / 21
53 17 / 21 Distribution of errors For which samples is the ASF most accurate? log 10 p(c) Unsigned relative error (%) θ = 0.01
54 17 / 21 Distribution of errors For which samples is the ASF most accurate? Monomorphic at both loci. log 10 p(c) Unsigned relative error (%) θ = 0.01 None of the above.
55 17 / 21 Distribution of errors For which samples is the ASF most accurate? log 10 p(c) Monomorphic at both loci. Monomorphic at one locus Unsigned relative error (%) θ = 0.01 None of the above.
56 17 / 21 Distribution of errors For which samples is the ASF most accurate? log 10 p(c) Monomorphic at both loci. Monomorphic at one locus. At least one allele has multiplicity one Unsigned relative error (%) θ = 0.01 None of the above.
57 17 / 21 Distribution of errors For which samples is the ASF most accurate? log 10 p(c) Unsigned relative error (%) Monomorphic at both loci. Monomorphic at one locus. At least one allele has multiplicity one. + Very low LD: r θ = 0.01 None of the above.
58 17 / 21 Distribution of errors For which samples is the ASF most accurate? log 10 p(c) Unsigned relative error (%) Monomorphic at both loci. Monomorphic at one locus. At least one allele has multiplicity one. + Very low LD: r Perfect LD. θ = 0.01 None of the above.
59 17 / 21 Distribution of errors For which samples is the ASF most accurate? log 10 p(c) Unsigned relative error (%) θ = 0.01 Monomorphic at both loci. Monomorphic at one locus. At least one allele has multiplicity one. + Very low LD: r Perfect LD. Perfect LD except for one haplotype. None of the above.
60 17 / 21 Distribution of errors For which samples is the ASF most accurate? log 10 p(c) 6 log 10 p(c) Unsigned relative error (%) Unsigned relative error (%) θ = 0.01 θ = 1
61 17 / 21 Outline 1 Introduction Motivation 2 Asymptotic Sampling Formula Closed-form one-locus sampling formulas Two-locus sample configuration Universality Details Classifying the maximum likelihood estimate (MLE) 3 Accuracy Results Detailed examples Distribution of errors 4 Conclusion Summary Further work
62 18 / 21 Summary Asymptotic series: q(n ρ) = q 0 (n)+ q 1(n) + q 2(n) ρ ρ 2 +O ( ) 1 ρ 3 Summary of results 1 Found a closed-form formula for the first-order term q 1 (n).
63 18 / 21 Summary Asymptotic series: q(n ρ) = q 0 (n)+ q 1(n) + q 2(n) ρ ρ 2 +O ( ) 1 ρ 3 Summary of results 1 Found a closed-form formula for the first-order term q 1 (n). 2 This formula is universal to all mutation models.
64 18 / 21 Summary Asymptotic series: q(n ρ) = q 0 (n)+ q 1(n) + q 2(n) ρ ρ 2 +O ( ) 1 ρ 3 Summary of results 1 Found a closed-form formula for the first-order term q 1 (n). 2 This formula is universal to all mutation models. 3 Found a closed-form formula + (extra bit) for q 2 (n). c q 2 (a, b, c) = q 2 (a + c A, b + c B, 0) + E[f 2 (A (m), B (m), C (m) )]. m=1
65 18 / 21 Summary Asymptotic series: q(n ρ) = q 0 (n)+ q 1(n) + q 2(n) ρ ρ 2 +O ( ) 1 ρ 3 Summary of results 1 Found a closed-form formula for the first-order term q 1 (n). 2 This formula is universal to all mutation models. 3 Found a closed-form formula + (extra bit) for q 2 (n). c q 2 (a, b, c) = q 2 (a + c A, b + c B, 0) + E[f 2 (A (m), B (m), C (m) )]. m=1 4 The extra bit is negligibly small and can be easily evaluated using dynamic programming if one wishes.
66 18 / 21 Summary Asymptotic series: q(n ρ) = q 0 (n)+ q 1(n) + q 2(n) ρ ρ 2 +O ( ) 1 ρ 3 Summary of results 1 Found a closed-form formula for the first-order term q 1 (n). 2 This formula is universal to all mutation models. 3 Found a closed-form formula + (extra bit) for q 2 (n). c q 2 (a, b, c) = q 2 (a + c A, b + c B, 0) + E[f 2 (A (m), B (m), C (m) )]. m=1 4 The extra bit is negligibly small and can be easily evaluated using dynamic programming if one wishes. 5 Found a simple sufficient condition for a given two-locus sample configuration to have a finite MLE of ρ.
67 19 / 21 Summary Applications Papers Composite-likelihood method such as LDhat. (Monte Carlo methods become substantially less efficient as ρ increases, so our analytic work will be of practical utility.) Test for epistasis. Sample-wise LD: P[r 2 ] and E[r 2 ] as ρ. Infinite-alleles case: Jenkins, P.A. and Song, Y.S. An asymptotic sampling formula for the coalescent with recombination. Annals of Applied Probability, under minor revision. Arbitrary finite-alleles recurrent mutation case: Jenkins, P.A. and Song, Y.S. Closed-form two-locus sampling distributions: accuracy and universality. Genetics, in press.
68 Further work Future work 1 Does there exist a stochastic process dual to the coalescent with recombination? 2 Is there a combinatorial interpretation for q 1 (a, b, c)? cf. Interpretation of Ewens sampling formula as the joint distribution of cycle counts in a random permutation 3 Does the universality property extend to q k (c) for k > 1? 4 For all k > 0, we think the following is true: c q k (a, b, c) = q k (a + c A, b + c B, 0) + E[f k (A (m), B (m), C (m) )] m=1 where f k is a deg-2k polynomial in the entries C ij of C (m). Can we automate the computation of the expectation? q 5 Does k (a, b, c) ρ k converge? k=0 6 Extend to multiple loci? 20 / 21
69 Further work Acknowledgments Thank you for your attention! Useful discussions Bob Griffiths Chuck Langley Rasmus Nielsen Josh Paul Monty Slatkin Programming Fulton Wang Research supported in part by NIH, Alfred P. Sloan Research Fellowship, and Packard Fellowship for Science and Engineering 21 / 21
AN ASYMPTOTIC SAMPLING FORMULA FOR THE COALESCENT WITH RECOMBINATION. By Paul A. Jenkins and Yun S. Song, University of California, Berkeley
AN ASYMPTOTIC SAMPLING FORMULA FOR THE COALESCENT WITH RECOMBINATION By Paul A. Jenkins and Yun S. Song, University of California, Berkeley Ewens sampling formula (ESF) is a one-parameter family of probability
More informationCLOSED-FORM ASYMPTOTIC SAMPLING DISTRIBUTIONS UNDER THE COALESCENT WITH RECOMBINATION FOR AN ARBITRARY NUMBER OF LOCI
Adv. Appl. Prob. 44, 391 407 (01) Printed in Northern Ireland Applied Probability Trust 01 CLOSED-FORM ASYMPTOTIC SAMPLING DISTRIBUTIONS UNDER THE COALESCENT WITH RECOMBINATION FOR AN ARBITRARY NUMBER
More informationBy Paul A. Jenkins and Yun S. Song, University of California, Berkeley July 26, 2010
PADÉ APPROXIMANTS AND EXACT TWO-LOCUS SAMPLING DISTRIBUTIONS By Paul A. Jenkins and Yun S. Song, University of California, Berkeley July 26, 2010 For population genetics models with recombination, obtaining
More informationA Principled Approach to Deriving Approximate Conditional Sampling. Distributions in Population Genetics Models with Recombination
Genetics: Published Articles Ahead of Print, published on June 30, 2010 as 10.1534/genetics.110.117986 A Principled Approach to Deriving Approximate Conditional Sampling Distributions in Population Genetics
More informationFrequency Spectra and Inference in Population Genetics
Frequency Spectra and Inference in Population Genetics Although coalescent models have come to play a central role in population genetics, there are some situations where genealogies may not lead to efficient
More informationAnalytic Computation of the Expectation of the Linkage Disequilibrium Coefficient r 2
Analytic Computation of the Expectation of the Linkage Disequilibrium Coefficient r 2 Yun S. Song a and Jun S. Song b a Department of Computer Science, University of California, Davis, CA 95616, USA b
More information1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:
.5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the
More informationSupplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles
Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To
More informationChapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)
12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²
More informationCounting All Possible Ancestral Configurations of Sample Sequences in Population Genetics
1 Counting All Possible Ancestral Configurations of Sample Sequences in Population Genetics Yun S. Song, Rune Lyngsø, and Jotun Hein Abstract Given a set D of input sequences, a genealogy for D can be
More informationSolutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin
Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele
More information2. Map genetic distance between markers
Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,
More informationThe Wright-Fisher Model and Genetic Drift
The Wright-Fisher Model and Genetic Drift January 22, 2015 1 1 Hardy-Weinberg Equilibrium Our goal is to understand the dynamics of allele and genotype frequencies in an infinite, randomlymating population
More informationMatch probabilities in a finite, subdivided population
Match probabilities in a finite, subdivided population Anna-Sapfo Malaspinas a, Montgomery Slatkin a, Yun S. Song b, a Department of Integrative Biology, University of California, Berkeley, CA 94720, USA
More informationp(d g A,g B )p(g B ), g B
Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)
More informationI of a gene sampled from a randomly mating popdation,
Copyright 0 1987 by the Genetics Society of America Average Number of Nucleotide Differences in a From a Single Subpopulation: A Test for Population Subdivision Curtis Strobeck Department of Zoology, University
More informationFor 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.
STAT 550 Howework 6 Anton Amirov 1. This question relates to the same study you saw in Homework-4, by Dr. Arno Motulsky and coworkers, and published in Thompson et al. (1988; Am.J.Hum.Genet, 42, 113-124).
More informationMathematical models in population genetics II
Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population
More informationPopulation Genetics I. Bio
Population Genetics I. Bio5488-2018 Don Conrad dconrad@genetics.wustl.edu Why study population genetics? Functional Inference Demographic inference: History of mankind is written in our DNA. We can learn
More informationSampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.
Sampling Methods Oliver Schulte - CMP 419/726 Bishop PRML Ch. 11 Recall Inference or General Graphs Junction tree algorithm is an exact inference method for arbitrary graphs A particular tree structure
More informationStatistical population genetics
Statistical population genetics Lecture 7: Infinite alleles model Xavier Didelot Dept of Statistics, Univ of Oxford didelot@stats.ox.ac.uk Slide 111 of 161 Infinite alleles model We now discuss the effect
More informationURN MODELS: the Ewens Sampling Lemma
Department of Computer Science Brown University, Providence sorin@cs.brown.edu October 3, 2014 1 2 3 4 Mutation Mutation: typical values for parameters Equilibrium Probability of fixation 5 6 Ewens Sampling
More informationBayesian Methods with Monte Carlo Markov Chains II
Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3
More informationEstimating Evolutionary Trees. Phylogenetic Methods
Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent
More informationPhasing via the Expectation Maximization (EM) Algorithm
Computing Haplotype Frequencies and Haplotype Phasing via the Expectation Maximization (EM) Algorithm Department of Computer Science Brown University, Providence sorin@cs.brown.edu September 14, 2010 Outline
More informationThe Combinatorial Interpretation of Formulas in Coalescent Theory
The Combinatorial Interpretation of Formulas in Coalescent Theory John L. Spouge National Center for Biotechnology Information NLM, NIH, DHHS spouge@ncbi.nlm.nih.gov Bldg. A, Rm. N 0 NCBI, NLM, NIH Bethesda
More informationBayesian Inference of Interactions and Associations
Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,
More informationPopulation Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda
1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationLecture 9. QTL Mapping 2: Outbred Populations
Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred
More informationCS 188: Artificial Intelligence. Bayes Nets
CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew
More informationCS294: Pseudorandomness and Combinatorial Constructions September 13, Notes for Lecture 5
UC Berkeley Handout N5 CS94: Pseudorandomness and Combinatorial Constructions September 3, 005 Professor Luca Trevisan Scribe: Gatis Midrijanis Notes for Lecture 5 In the few lectures we are going to look
More informationAEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity,
AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity, Today: Review Probability in Populatin Genetics Review basic statistics Population Definition
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationGenetic hitch-hiking in a subdivided population
Genet. Res., Camb. (1998), 71, pp. 155 160. With 3 figures. Printed in the United Kingdom 1998 Cambridge University Press 155 Genetic hitch-hiking in a subdivided population MONTGOMERY SLATKIN* AND THOMAS
More informationSTAT 536: Genetic Statistics
STAT 536: Genetic Statistics Frequency Estimation Karin S. Dorman Department of Statistics Iowa State University August 28, 2006 Fundamental rules of genetics Law of Segregation a diploid parent is equally
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationLecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012
Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012 Last Time Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based
More informationModelling Linkage Disequilibrium, And Identifying Recombination Hotspots Using SNP Data
Modelling Linkage Disequilibrium, And Identifying Recombination Hotspots Using SNP Data Na Li and Matthew Stephens July 25, 2003 Department of Biostatistics, University of Washington, Seattle, WA 98195
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationModelling of Dependent Credit Rating Transitions
ling of (Joint work with Uwe Schmock) Financial and Actuarial Mathematics Vienna University of Technology Wien, 15.07.2010 Introduction Motivation: Volcano on Iceland erupted and caused that most of the
More informationON THE DIFFUSION OPERATOR IN POPULATION GENETICS
J. Appl. Math. & Informatics Vol. 30(2012), No. 3-4, pp. 677-683 Website: http://www.kcam.biz ON THE DIFFUSION OPERATOR IN POPULATION GENETICS WON CHOI Abstract. W.Choi([1]) obtains a complete description
More informationThe Moran Process as a Markov Chain on Leaf-labeled Trees
The Moran Process as a Markov Chain on Leaf-labeled Trees David J. Aldous University of California Department of Statistics 367 Evans Hall # 3860 Berkeley CA 94720-3860 aldous@stat.berkeley.edu http://www.stat.berkeley.edu/users/aldous
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationStationary Distribution of the Linkage Disequilibrium Coefficient r 2
Stationary Distribution of the Linkage Disequilibrium Coefficient r 2 Wei Zhang, Jing Liu, Rachel Fewster and Jesse Goodman Department of Statistics, The University of Auckland December 1, 2015 Overview
More informationThe Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion. Bob Griffiths University of Oxford
The Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion Bob Griffiths University of Oxford A d-dimensional Λ-Fleming-Viot process {X(t)} t 0 representing frequencies of d types of individuals
More informationEstimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates
Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates JOSEPH FELSENSTEIN Department of Genetics SK-50, University
More informationSTAT 536: Genetic Statistics
STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationCombining the cycle index and the Tutte polynomial?
Combining the cycle index and the Tutte polynomial? Peter J. Cameron University of St Andrews Combinatorics Seminar University of Vienna 23 March 2017 Selections Students often meet the following table
More informationProblems for 3505 (2011)
Problems for 505 (2011) 1. In the simplex of genotype distributions x + y + z = 1, for two alleles, the Hardy- Weinberg distributions x = p 2, y = 2pq, z = q 2 (p + q = 1) are characterized by y 2 = 4xz.
More informationInteger Programming in Computational Biology. D. Gusfield University of California, Davis Presented December 12, 2016.!
Integer Programming in Computational Biology D. Gusfield University of California, Davis Presented December 12, 2016. There are many important phylogeny problems that depart from simple tree models: Missing
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationCS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)
CS1820 Notes hgupta1, kjline, smechery April 3-April 5 April 3 Notes 1 Minichiello-Durbin Algorithm input: set of sequences output: plausible Ancestral Recombination Graph (ARG) note: the optimal ARG is
More informationThe Quantitative TDT
The Quantitative TDT (Quantitative Transmission Disequilibrium Test) Warren J. Ewens NUS, Singapore 10 June, 2009 The initial aim of the (QUALITATIVE) TDT was to test for linkage between a marker locus
More information6 Introduction to Population Genetics
70 Grundlagen der Bioinformatik, SoSe 11, D. Huson, May 19, 2011 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,
More information26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.
10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with
More informationMonte Carlo in Bayesian Statistics
Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview
More information122 9 NEUTRALITY TESTS
122 9 NEUTRALITY TESTS 9 Neutrality Tests Up to now, we calculated different things from various models and compared our findings with data. But to be able to state, with some quantifiable certainty, that
More informationBayesian Estimation of Input Output Tables for Russia
Bayesian Estimation of Input Output Tables for Russia Oleg Lugovoy (EDF, RANE) Andrey Polbin (RANE) Vladimir Potashnikov (RANE) WIOD Conference April 24, 2012 Groningen Outline Motivation Objectives Bayesian
More informationHaplotyping as Perfect Phylogeny: A direct approach
Haplotyping as Perfect Phylogeny: A direct approach Vineet Bafna Dan Gusfield Giuseppe Lancia Shibu Yooseph February 7, 2003 Abstract A full Haplotype Map of the human genome will prove extremely valuable
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More informationPopulation genetics snippets for genepop
Population genetics snippets for genepop Peter Beerli August 0, 205 Contents 0.Basics 0.2Exact test 2 0.Fixation indices 4 0.4Isolation by Distance 5 0.5Further Reading 8 0.6References 8 0.7Disclaimer
More informationBIRS workshop Sept 6 11, 2009
Diploid biparental Moran model with large offspring numbers and recombination Bjarki Eldon New mathematical challenges from molecular biology and genetics BIRS workshop Sept 6, 2009 Mendel s Laws The First
More informationLinkage and Linkage Disequilibrium
Linkage and Linkage Disequilibrium Summer Institute in Statistical Genetics 2014 Module 10 Topic 3 Linkage in a simple genetic cross Linkage In the early 1900 s Bateson and Punnet conducted genetic studies
More information6 Introduction to Population Genetics
Grundlagen der Bioinformatik, SoSe 14, D. Huson, May 18, 2014 67 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,
More informationABCME: Summary statistics selection for ABC inference in R
ABCME: Summary statistics selection for ABC inference in R Matt Nunes and David Balding Lancaster University UCL Genetics Institute Outline Motivation: why the ABCME package? Description of the package
More informationLecture 18 : Ewens sampling formula
Lecture 8 : Ewens sampling formula MATH85K - Spring 00 Lecturer: Sebastien Roch References: [Dur08, Chapter.3]. Previous class In the previous lecture, we introduced Kingman s coalescent as a limit of
More informationMathematical Biology. Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model. Matthias Birkner Jochen Blath
J. Math. Biol. DOI 0.007/s00285-008-070-6 Mathematical Biology Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model Matthias Birkner Jochen Blath Received:
More informationBayes Nets: Sampling
Bayes Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Approximate Inference:
More informationEXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?
Statistical Genetics Agronomy 65 W. E. Nyquist March 004 EXERCISES FOR CHAPTER 3 Exercise 3.. a. Define random mating. b. Discuss what random mating as defined in (a) above means in a single infinite population
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationAnnouncements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic
CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,
More informationCSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation
CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam Banerjee November 26, 2007 Genetic Polymorphism Single nucleotide polymorphism (SNP) Genetic Polymorphism
More informationLecture 8 - Algebraic Methods for Matching 1
CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) February 1, 2018 Lecture 8 - Algebraic Methods for Matching 1 In the last lecture we showed that
More informationCompute the Fourier transform on the first register to get x {0,1} n x 0.
CS 94 Recursive Fourier Sampling, Simon s Algorithm /5/009 Spring 009 Lecture 3 1 Review Recall that we can write any classical circuit x f(x) as a reversible circuit R f. We can view R f as a unitary
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April
More informationQuality Control Using Inferential Statistics In Weibull Based Reliability Analyses S. F. Duffy 1 and A. Parikh 2
Quality Control Using Inferential Statistics In Weibull Based Reliability Analyses S. F. Duffy 1 and A. Parikh 2 1 Cleveland State University 2 N & R Engineering www.inl.gov ASTM Symposium on Graphite
More informationBTRY 4830/6830: Quantitative Genomics and Genetics
BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements
More information1.5 MM and EM Algorithms
72 CHAPTER 1. SEQUENCE MODELS 1.5 MM and EM Algorithms The MM algorithm [1] is an iterative algorithm that can be used to minimize or maximize a function. We focus on using it to maximize the log likelihood
More informationAn introduction to mathematical modeling of the genealogical process of genes
An introduction to mathematical modeling of the genealogical process of genes Rikard Hellman Kandidatuppsats i matematisk statistik Bachelor Thesis in Mathematical Statistics Kandidatuppsats 2009:3 Matematisk
More informationCoalescent based demographic inference. Daniel Wegmann University of Fribourg
Coalescent based demographic inference Daniel Wegmann University of Fribourg Introduction The current genetic diversity is the outcome of past evolutionary processes. Hence, we can use genetic diversity
More informationA Frequentist Assessment of Bayesian Inclusion Probabilities
A Frequentist Assessment of Bayesian Inclusion Probabilities Department of Statistical Sciences and Operations Research October 13, 2008 Outline 1 Quantitative Traits Genetic Map The model 2 Bayesian Model
More informationMore Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction
Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order
More informationA sequentially Markov conditional sampling distribution for structured populations with migration and recombination
A sequentially Markov conditional sampling distribution for structured populations with migration and recombination Matthias Steinrücken a, Joshua S. Paul b, Yun S. Song a,b, a Department of Statistics,
More informationk-protected VERTICES IN BINARY SEARCH TREES
k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from
More informationI Have the Power in QTL linkage: single and multilocus analysis
I Have the Power in QTL linkage: single and multilocus analysis Benjamin Neale 1, Sir Shaun Purcell 2 & Pak Sham 13 1 SGDP, IoP, London, UK 2 Harvard School of Public Health, Cambridge, MA, USA 3 Department
More informationIntroduction to Linkage Disequilibrium
Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have
More informationAssessing Regime Uncertainty Through Reversible Jump McMC
Assessing Regime Uncertainty Through Reversible Jump McMC August 14, 2008 1 Introduction Background Research Question 2 The RJMcMC Method McMC RJMcMC Algorithm Dependent Proposals Independent Proposals
More informationJoyce, Krone, and Kurtz
Statistical Inference for Population Genetics Models by Paul Joyce University of Idaho A large body of mathematical population genetics was developed by the three main speakers in this symposium. As a
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Two sample T 2 test 1 Two sample T 2 test 2 Analogous to the univariate context, we
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationThe purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.
Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationESTIMATION of recombination fractions using ped- ber of pairwise differences (Hudson 1987; Wakeley
Copyright 2001 by the Genetics Society of America Estimating Recombination Rates From Population Genetic Data Paul Fearnhead and Peter Donnelly Department of Statistics, University of Oxford, Oxford, OX1
More information