Stationary Distribution of the Linkage Disequilibrium Coefficient r 2

Similar documents
Evolutionary Theory. Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A.

Population Genetics: a tutorial

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Analytic Computation of the Expectation of the Linkage Disequilibrium Coefficient r 2

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

(Genome-wide) association analysis

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Problems for 3505 (2011)

Lecture 24: Multivariate Response: Changes in G. Bruce Walsh lecture notes Synbreed course version 10 July 2013

Runaway. demogenetic model for sexual selection. Louise Chevalier. Jacques Labonne

Exercises with solutions (Set D)

Population Genetics I. Bio

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

The Quantitative TDT

Mathematical models in population genetics II

Processes of Evolution

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

When one gene is wild type and the other mutant:

Introduc)on to Gene)cs How to Analyze Your Own Genome Fall 2013

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.

Inbreeding depression due to stabilizing selection on a quantitative character. Emmanuelle Porcher & Russell Lande

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Linkage and Linkage Disequilibrium

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

A. Correct! Genetically a female is XX, and has 22 pairs of autosomes.

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014

Bayesian Methods with Monte Carlo Markov Chains II

Evolutionary quantitative genetics and one-locus population genetics

Q2 (4.6) Put the following in order from biggest to smallest: Gene DNA Cell Chromosome Nucleus. Q8 (Biology) (4.6)

Life Cycles, Meiosis and Genetic Variability24/02/2015 2:26 PM

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Lecture WS Evolutionary Genetics Part I 1

1.3 Forward Kolmogorov equation

NEUTRAL EVOLUTION IN ONE- AND TWO-LOCUS SYSTEMS

BIRS workshop Sept 6 11, 2009

MEIOSIS C H A P T E R 1 3

p(d g A,g B )p(g B ), g B

URN MODELS: the Ewens Sampling Lemma

Introduction to Natural Selection. Ryan Hernandez Tim O Connor

Frequency Spectra and Inference in Population Genetics

Bayesian Inference of Interactions and Associations

Selection and Population Genetics

Patterns of inheritance

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Linking levels of selection with genetic modifiers

Lecture 9. Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Synbreed course version 3 July 2013

Computational Approaches to Statistical Genetics

Lecture 9. QTL Mapping 2: Outbred Populations

Linkage and Chromosome Mapping

Darwinian Selection. Chapter 6 Natural Selection Basics 3/25/13. v evolution vs. natural selection? v evolution. v natural selection

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.

A New Metric for Parental Selection in Plant Breeding

Genetics (patterns of inheritance)

Tutorial on Theoretical Population Genetics

Segregation versus mitotic recombination APPENDIX

Chapter 13: Meiosis and Sexual Life Cycles Overview: Hereditary Similarity and Variation

2. Map genetic distance between markers

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

How robust are the predictions of the W-F Model?

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Outline for today s lecture (Ch. 14, Part I)

Affected Sibling Pairs. Biostatistics 666

Question: If mating occurs at random in the population, what will the frequencies of A 1 and A 2 be in the next generation?

1. Natural selection can only occur if there is variation among members of the same species. WHY?

Linear Regression (1/1/17)

Sexual Reproduction and Genetics

Association studies and regression

Calculation of IBD probabilities

Agenda. 1. Lesson Learning Goals 2. Meiosis 3. Meiosis Bingo

8. Genetic Diversity

arxiv: v2 [q-bio.pe] 14 Mar 2016

Natural Selection. Population Dynamics. The Origins of Genetic Variation. The Origins of Genetic Variation. Intergenerational Mutation Rate

I genetic systems in an effort to understand the structure of natural populations.

Random Times and Their Properties

Unit 6 : Meiosis & Sexual Reproduction

Solutions to Problem Set 4

Econometrics I, Estimation

Closed-form sampling formulas for the coalescent with recombination

If we want to analyze experimental or simulated data we might encounter the following tasks:

Microevolution Changing Allele Frequencies

Calculation of IBD probabilities

Ch 11.Introduction to Genetics.Biology.Landis

Objectives. Announcements. Comparison of mitosis and meiosis

1. The diagram below shows two processes (A and B) involved in sexual reproduction in plants and animals.

Meiosis. Two distinct divisions, called meiosis I and meiosis II

Introduction to Linkage Disequilibrium

Research Question How are gametes produced?

1 Errors in mitosis and meiosis can result in chromosomal abnormalities.

Genetic Variation in Finite Populations

Linkage and Chromosome Mapping

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

Overview. Background

Unit 6 Reading Guide: PART I Biology Part I Due: Monday/Tuesday, February 5 th /6 th

19. Genetic Drift. The biological context. There are four basic consequences of genetic drift:

Transcription:

Stationary Distribution of the Linkage Disequilibrium Coefficient r 2 Wei Zhang, Jing Liu, Rachel Fewster and Jesse Goodman Department of Statistics, The University of Auckland December 1, 2015

Overview 1 Introduction 2 Background 3 Methodology 4 Results 5 Acknowledgements 6 References

Introduction 1 Linkage disequilibrium (LD) indicates the statistical dependence of alleles at different loci in population genetics. Introduction December 1, 2015 3 / 35

Introduction 1 Linkage disequilibrium (LD) indicates the statistical dependence of alleles at different loci in population genetics. LD can be used to find genetic markers that might be linked to genes for diseases and understand the evolutionary history. Introduction December 1, 2015 3 / 35

Introduction 1 Linkage disequilibrium (LD) indicates the statistical dependence of alleles at different loci in population genetics. 2 r 2 is a quantitative measure of LD and we are aiming to find its stationary distribution under models for genetic drift. Introduction December 1, 2015 4 / 35

Introduction 1 Linkage disequilibrium (LD) indicates the statistical dependence of alleles at different loci in population genetics. 2 r 2 is a quantitative measure of LD and we are aiming to find its stationary distribution under models for genetic drift. 3 Given some moments of the unknown distribution, the maximum entropy (Maxent) principle can be used to approximate the density function of r 2. Introduction December 1, 2015 4 / 35

Introduction 1 Linkage disequilibrium (LD) indicates the statistical dependence of alleles at different loci in population genetics. 2 r 2 is a quantitative measure of LD and we are aiming to find its stationary distribution under models for genetic drift. 3 Given some moments of the unknown distribution, the maximum entropy (Maxent) principle can be used to approximate the density function of r 2. 4 The diffusion approximation is a powerful tool to compute certain expectations at stationarity. Introduction December 1, 2015 4 / 35

Background The TLD model (Liu, 2012) Diffusion approximation Maximum entropy principle Background December 1, 2015 5 / 35

Some terminologies in genetics 1 One locus is a position of a gene or significant DNA sequence on a chromosome. 2 An allele is a variant form of a gene. 3 Diploid describes a cell or an organism that has paired chromosomes, one from each parent. 4 A mutation is a permanent change in the DNA sequence. 5 Recombination is the production of offspring with combinations of traits that differ from those found in either parent. Background The TLD model December 1, 2015 6 / 35

Some terminologies in genetics Background The TLD model December 1, 2015 7 / 35

Some terminologies in genetics Background The TLD model December 1, 2015 8 / 35

The TLD model TLD is short for two-locus diallelic model with mutation and recombination. Background The TLD model December 1, 2015 9 / 35

The TLD model TLD is short for two-locus diallelic model with mutation and recombination. Model assumptions and notations: The two possible alleles on each of the two loci are assumed to be A 1, A 2 and B 1, B 2, thus the four possible types of gamete are A 1 B 1, A 1 B 2, A 2 B 1 and A 2 B 2. Recombination rate C: A i B j + A m B n A i B n /A m B j. Equal mutation rate µ for both loci: A 1 A 2 and B 1 B 2. Background The TLD model December 1, 2015 9 / 35

The TLD model Background The TLD model December 1, 2015 10 / 35

The TLD model Table 1: The proportions of gametes in generation T and the expected proportions in generation T + 1 in the population. N is the population size. Generation Gamete A 1 B 1 A 1 B 2 A 2 B 1 A 2 B 2 T x 1 2N x 2 2N x 3 2N x 4 2N T + 1 φ 1 φ 2 φ 3 φ 4 Background The TLD model December 1, 2015 11 / 35

The TLD model Suppose We have φ 1 (x) = x 1 2N (1 µ)2 + φ 2 (x) = x 2 2N (1 µ)2 + φ 3 (x) = x 3 2N (1 µ)2 + φ 4 (x) = x 4 2N (1 µ)2 + D(x) = x 1 2N ( x2 2N + x 3 2N ( x1 2N + x 4 2N ( x1 2N + x 4 2N ( x2 2N + x 3 2N x 4 2N x 2 2N x 3 2N. ) µ(1 µ) + x 4 ) µ(1 µ) + x 3 ) µ(1 µ) + x 2 2N µ2 CD(x)(1 2µ) 2 2N µ2 + CD(x)(1 2µ) 2 2N µ2 + CD(x)(1 2µ) 2 ) µ(1 µ) + x 1 2N µ2 CD(x)(1 2µ) 2. Background The TLD model December 1, 2015 12 / 35

The TLD model The transition probability of going from x = (x 1, x 2, x 3, x 4 ) to y = (y 1, y 2, y 3, y 4 ) is: p xy = P (y x) = (2N)! y 1!y 2!y 3!y 4! (φ 1 (x)) y 1 (φ 2 (x)) y 2 (φ 3 (x)) y 3 (φ 4 (x)) y 4. Background The TLD model December 1, 2015 13 / 35

The TLD model The transition probability of going from x = (x 1, x 2, x 3, x 4 ) to y = (y 1, y 2, y 3, y 4 ) is: p xy = P (y x) = (2N)! y 1!y 2!y 3!y 4! (φ 1 (x)) y 1 (φ 2 (x)) y 2 (φ 3 (x)) y 3 (φ 4 (x)) y 4. The TLD model is an irreducible aperiodic Markov chain, thus there exists a unique stationary distribution. Background The TLD model December 1, 2015 13 / 35

Diffusion approximation The main idea is to rescale the discrete state space and time space by a factor related to the population size N, so that the gap between two successive states in the new space is infinitesimal when N is large enough. Background Diffusion approximation December 1, 2015 14 / 35

Diffusion approximation The main idea is to rescale the discrete state space and time space by a factor related to the population size N, so that the gap between two successive states in the new space is infinitesimal when N is large enough. Example State space {0, 1, 2,, 2N} (2N) 1 {0, 1 2N, 2 2N,, 1}. Time space {0, 1, 2, } (2N) 1 {0δt, 1δt, 2δt, }, where δt = 1 2N. Background Diffusion approximation December 1, 2015 14 / 35

Diffusion approximation When N is large enough, the new chain is approximately continuous. In the derivation process, Taylor series expansion and the definition of derivative are used to get two important results for the TLD model. f f(x+ x) f(x) (x) = lim x 0 x Background Diffusion approximation December 1, 2015 15 / 35

The diffusion operator and master equation The diffusion operator for the TLD model is L = 1 2 p (1 p) 2 p 2 + 1 2 q (1 q) 2 q 2 + 1 { p (1 p) q (1 q) + D (1 2p) (1 2q) D 2 } 2 2 D 2 + D 2 p q + D (1 2p) 2 p D + D (1 2q) 2 q D + θ 4 (1 2p) p + θ 4 (1 2q) q ( D 1 + ρ ) 2 + θ D and the master equation at stationarity is E {Lf (p, q, D)} = 0. The master equation means the expected evolution over time of any nice function of p, q and D is zero at stationarity. Background Diffusion approximation December 1, 2015 16 / 35

The diffusion operator and master equation The diffusion operator for the TLD model is L = 1 2 p (1 p) 2 p 2 + 1 2 q (1 q) 2 q 2 + 1 { p (1 p) q (1 q) + D (1 2p) (1 2q) D 2 } 2 2 D 2 + D 2 p q + D (1 2p) 2 p D + D (1 2q) 2 q D + θ 4 (1 2p) p + θ 4 (1 2q) q ( D 1 + ρ ) 2 + θ D and the master equation at stationarity is E {Lf (p, q, D)} = 0. Here p and q are the frequencies of A 1 and B 1, D = f 11 pq, f 11 is the frequency of gamete A 1 B 1, ρ = 2NC, θ = 2Nµ and f is any twice continuously differentiable function with compact support. Background Diffusion approximation December 1, 2015 17 / 35

A simple example If letting f (p, q, D) = D, we can get that Lf (p, q, D) = D (1 + ρ ) 2 + θ and so E {Lf (p, q, D)} = (1 + ρ ) 2 + θ E (D) = 0 E (D) = 0. Background Diffusion approximation December 1, 2015 18 / 35

Maximum entropy principle Definition Entropy is the quantitative measure of disorder in a system. Suppose a random variable Z has K possible outcomes with probabilities p 1, p 2, p 3,, p K, the entropy is: I (Z) = K p i log K (p i ). i=1 Background Maximum entropy principle December 1, 2015 19 / 35

Maximum entropy principle Definition Entropy is the quantitative measure of disorder in a system. Suppose a random variable Z has K possible outcomes with probabilities p 1, p 2, p 3,, p K, the entropy is: I (Z) = K p i log K (p i ). i=1 The maximum entropy principle states that the solution that maximises the entropy is the most honest one. Background Maximum entropy principle December 1, 2015 19 / 35

An example In a coin toss experiment, suppose Pr(head)=p and Pr(tail)=1 p, then the entropy is: I = p log 2 (p) (1 p) log 2 (1 p). Background Maximum entropy principle December 1, 2015 20 / 35

An example Relationship of the entropy and p Entropy 0.2 0.4 0.6 0.8 1.0 (0.5, 1.0) 0.0 0.2 0.4 0.6 0.8 1.0 p Background Maximum entropy principle December 1, 2015 21 / 35

Maximum entropy principle The Maxent solution of an unknown probability density function π (p) given knowledge of n moments m i = E ( p i), i = 1,, n is the solution π n (p) that maximizes: I = π n (p) log { π n (p)} dp subject to m i = where Ω is the support of π. Ω Ω p i π n (p) dp for i = 0, 1,, n Background Maximum entropy principle December 1, 2015 22 / 35

Maximum entropy principle Considering the Lagrange function and Euler-Lagrange equation, then the Maxent solution is: π n (p) = exp ( λ 0 + λ 1 p + λ 2 p 2 + + λ n p n) where λ i, i = 0,, n are the solutions of { arg min λ exp ( λ 0 + λ 1 p + λ 2 p 2 + + λ n p n) dp Ω } n λ i m i. i=0 Background Maximum entropy principle December 1, 2015 23 / 35

Linkage disequilibrium coefficient r 2 The definition of r 2 is: r 2 = D 2 p (1 p) q (1 q) where p and q are the frequencies of A 1 and B 1, D = f 11 pq and f 11 is the frequency of gamete A 1 B 1. Methodology Analytic computation of the moments December 1, 2015 24 / 35

Reformulation of the problem Let u = 1 2p and v = 1 2q. Then the diffusion generator can be rewritten as L = 1 2 ( 1 u 2 ) 2 u 2 + 1 2 ( 1 v 2 ) 2 v 2 + 1 2 + 4D 2 u v 2Du 2 D u 2Dv 2 D v 1 2 θu { 1 ( 1 u 2 ) ( 1 v 2) } + Duv D 2 2 16 D 2 u 1 2 θv ( v D 1 + 1 2 ρ + θ ) D. This reparameterization yields r 2 = D 2 p (1 p) q (1 q) = 16D 2 (1 u 2 ) (1 v 2 ). Methodology Analytic computation of the moments December 1, 2015 25 / 35

Analytic computation of the moments Note that when 0 u 2, v 2 < 1: 1 1 u 2 = k=0 u 2k and 1 1 v 2 = v 2l. l=0 Considering the new form of r 2, it follows that when M = 1, 2 E (r ) 2M = 16 M k 1 =0 k M =0 l 1 =0 which can be simplified to E ( r 2M) = 16 M K=0 L=0 l M =0 { } E D 2M u 2(k 1+k 2 + +k M ) v 2(l 1+l 2 + +l M ), ( )( K + M 1 L + M 1 M 1 M 1 ) E ( D 2M u 2K v 2L). Methodology Analytic computation of the moments December 1, 2015 26 / 35

Analytic computation of the moments Our problem now is how to compute E ( D 2M u 2K v 2L) for all possible M, K and L. Step 1: Let f in the master equation be some specific forms of u n, uv, u 2 v and Du n, we can get the results of E (u n ), E (uv), E ( u 2 v ) etc. Step 2: Given a value of (m, n), apply the function f = D k u m+2 k v n+2 k into the master equation with k {0, 1, 2,, n + 2}. A system of n + 3 linear equations is generated, whose solutions are: E ( u m+2 v n+2), E ( Du m+1 v n+1), E ( D 2 u m v n) Methodology Analytic computation of the moments December 1, 2015 27 / 35

Analytic computation of the moments Given a M N and a truncation level l max N, we use E ( r 2M ) l max = 16 M 2K+2L=l max K,L 0 ( )( ) K + M 1 L + M 1 E ( D 2M u 2K v 2L) M 1 M 1 to approximate the M-th moment of r 2. Methodology Analytic computation of the moments December 1, 2015 28 / 35

Variance of r 2 Table 2: V ( r 2) computed by our method (l max = 700) ρ θ 0.0125 0.0500 0.1000 0.2500 0.7500 1.2500 0.00 0.006119 0.03109 0.03806 0.02433 0.00685 0.003239 0.25 0.003412 0.01950 0.02614 0.01891 0.00608 0.002928 0.50 0.002271 0.01372 0.01932 0.01517 0.00538 0.002739 1.25 0.001062 0.00666 0.00991 0.00892 0.00403 0.002197 2.50 0.000526 0.00327 0.00487 0.00476 0.00263 0.001590 5.00 0.000244 0.00142 0.00206 0.00202 0.00141 0.000961 Results Variance of r 2 December 1, 2015 29 / 35

Variance of r 2 V(r 2 ) 0.005 0.015 0.025 Variance of r 2 with ρ=0.25 Liu (2012) Analytic method 0.0 0.2 0.4 0.6 0.8 1.0 1.2 θ V(r 2 ) 0.001 0.003 0.005 Variance of r 2 with ρ=2.50 Liu (2012) Analytic method 0.0 0.2 0.4 0.6 0.8 1.0 1.2 θ Figure 1: Comparison of V ( r 2) between our analytic method and the method in Liu(2012) Results Variance of r 2 December 1, 2015 30 / 35

Probability density function of r 2 We can compute 50 moments in 2.5 hours with l max = 2000 on a laptop. Use n = 18 moments to calculate Maxent π ( r 2). Compare moments 19, 20,..., 50 using π ( r 2) vs analytic method. 2 π~ ) 0 50 100 n(r 150 200 250 300 Maxent density of r 2 using moments up to order n ( θ = 0.125 ; ρ = 5 ) ( No.Nodes = 1000 ) n = 18 Moment 5.0e 07 1.5e 06 2.5e 06 Analytic moments compared with the results of Maxent ( θ = 0.125 ; ρ = 5 ) ( No.Nodes = 1000 ; No.Moments = 18 ) Analytic Maxent 0.00 0.02 0.04 0.06 0.08 0.10 r 2 20 25 30 35 40 45 50 Order of moment Results Probability distribution of r 2 December 1, 2015 31 / 35

Probability density function of r 2 2 π~ ) 0 2000 n(r 4000 6000 8000 Maxent density of r 2 using moments up to order n ( θ = 0.0125 ; ρ = 0 ) ( No.Nodes = 1000 ) n = 16 Moment 0.0036 0.0038 0.0040 0.0042 Analytic moments compared with the results of Maxent ( θ = 0.0125 ; ρ = 0 ) ( No.Nodes = 1000 ; No.Moments = 16 ) Analytic Maxent 0e+00 2e 04 4e 04 6e 04 8e 04 1e 03 r 2 20 25 30 35 40 45 50 Order of moment Figure 2: Stationary density functions of r 2 for two pairs of θ and ρ approximated by the numerical univariate Maxent method. Results Probability distribution of r 2 December 1, 2015 32 / 35

Acknowledgements I would like to sincerely and gratefully thank my supervisor Rachel Fewster for her guidance, Dr. Jesse Goodman and Dr. Jing Liu for their useful ideas. Acknowledgements December 1, 2015 33 / 35

References Y. S. Song and J. S. Song, Analytic computation of the expectation of the linkage disequilibrium coefficient r 2, Theoretical population biology, vol. 71, no. 1, pp. 49 60, 2007. J. Liu, Reconstruction of probability distributions in population genetics. PhD thesis, The University of Auckland, 2012. X. Wu, Calculation of maximum entropy densities with application to income distribution, Journal of Econometrics, vol. 115, no. 2, pp. 347 354, 2003. M. Slatkin, Linkage disequilibrium understanding the evolutionary past and mapping the medical future, Nature Reviews Genetics, vol. 9, no. 6, pp. 477 485, 2008. W. G. Hill and A. Robertson, Linkage disequilibrium in finite populations, Theoretical and Applied Genetics, vol. 38, no. 6, pp. 226 231, 1968. L. R. Mead and N. Papanicolaou, Maximum entropy in the problem of moments, Journal of Mathematical Physics, vol. 25, no. 8, pp. 2404 2417, 1984. References December 1, 2015 34 / 35

Thanks!