Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Similar documents
The Wright-Fisher Model and Genetic Drift

Population Genetics I. Bio

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?

Problems for 3505 (2011)

Linkage and Linkage Disequilibrium

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

2. Map genetic distance between markers

8. Genetic Diversity

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Question: If mating occurs at random in the population, what will the frequencies of A 1 and A 2 be in the next generation?

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50

AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity,

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

Introduction to population genetics & evolution

The Quantitative TDT

Microevolution Changing Allele Frequencies

Case-Control Association Testing. Case-Control Association Testing

Population genetics snippets for genepop

Life Cycles, Meiosis and Genetic Variability24/02/2015 2:26 PM

Evolutionary Genetics Midterm 2008

19. Genetic Drift. The biological context. There are four basic consequences of genetic drift:

Mechanisms of Evolution

The Genetics of Natural Selection

Heredity and Genetics WKSH

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014

Introduction to Linkage Disequilibrium

1. Understand the methods for analyzing population structure in genomes

Mathematical modelling of Population Genetics: Daniel Bichener

Natural Selection. Population Dynamics. The Origins of Genetic Variation. The Origins of Genetic Variation. Intergenerational Mutation Rate

Introduction to Natural Selection. Ryan Hernandez Tim O Connor

A. Correct! Genetically a female is XX, and has 22 pairs of autosomes.

Outline of lectures 3-6

Solutions to Problem Set 4

Breeding Values and Inbreeding. Breeding Values and Inbreeding

Unit 6 Reading Guide: PART I Biology Part I Due: Monday/Tuesday, February 5 th /6 th

Case Studies in Ecology and Evolution

BIOL Evolution. Lecture 9

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

Population Genetics: a tutorial

Outline of lectures 3-6

Processes of Evolution

POPULATIONS. p t+1 = p t (1-u) + q t (v) p t+1 = p t (1-u) + (1-p t ) (v) Phenotypic Evolution: Process HOW DOES MUTATION CHANGE ALLELE FREQUENCIES?

Population Structure

Notes on Population Genetics

6.6 Meiosis and Genetic Variation. KEY CONCEPT Independent assortment and crossing over during meiosis result in genetic diversity.

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Computational Systems Biology: Biology X

Outline. P o purple % x white & white % x purple& F 1 all purple all purple. F purple, 224 white 781 purple, 263 white

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

Genetics and Natural Selection

URN MODELS: the Ewens Sampling Lemma

Genetics (patterns of inheritance)

Ch 11.Introduction to Genetics.Biology.Landis

(Genome-wide) association analysis

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17

Meiosis -> Inheritance. How do the events of Meiosis predict patterns of heritable variation?

Tutorial on Theoretical Population Genetics

Q Expected Coverage Achievement Merit Excellence. Punnett square completed with correct gametes and F2.

Inbreeding depression due to stabilizing selection on a quantitative character. Emmanuelle Porcher & Russell Lande

Affected Sibling Pairs. Biostatistics 666

NOTES CH 17 Evolution of. Populations

I Have the Power in QTL linkage: single and multilocus analysis

THE WORK OF GREGOR MENDEL

10. How many chromosomes are in human gametes (reproductive cells)? 23

1 Errors in mitosis and meiosis can result in chromosomal abnormalities.

THE EVOLUTION OF POPULATIONS THE EVOLUTION OF POPULATIONS

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Observation: we continue to observe large amounts of genetic variation in natural populations

Biology. Revisiting Booklet. 6. Inheritance, Variation and Evolution. Name:

Lesson 4: Understanding Genetics

STAT 536: Genetic Statistics

Population Genetics 7: Genetic Drift

Lecture 9. QTL Mapping 2: Outbred Populations

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Lab 12. Linkage Disequilibrium. November 28, 2012

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Dropping Your Genes. A Simulation of Meiosis and Fertilization and An Introduction to Probability

Section 11 1 The Work of Gregor Mendel

Notes for MCTP Week 2, 2014

Outline of lectures 3-6

1. Draw, label and describe the structure of DNA and RNA including bonding mechanisms.

Microevolution (Ch 16) Test Bank

Evolution of Populations. Chapter 17

List the five conditions that can disturb genetic equilibrium in a population.(10)

How robust are the predictions of the W-F Model?

F SR = (H R H S)/H R. Frequency of A Frequency of a Population Population

NEUTRAL EVOLUTION IN ONE- AND TWO-LOCUS SYSTEMS

Lecture 2. Basic Population and Quantitative Genetics

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Gene Pool The combined genetic material for all the members of a population. (all the genes in a population)

Outline for today s lecture (Ch. 14, Part I)

Stationary Distribution of the Linkage Disequilibrium Coefficient r 2

Objectives. Announcements. Comparison of mitosis and meiosis

Unit 2 Lesson 4 - Heredity. 7 th Grade Cells and Heredity (Mod A) Unit 2 Lesson 4 - Heredity

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Population Genetics II (Selection + Haplotype analyses)

Transcription:

1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu

2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding; random mating; no mutation; no migration; no selection; no overlapping generations. Prediction: Alleles Frequencies Genotypes Frequencies A a A a = AA Aa aa p q p 2 2pq q 2 Consequences: variation is maintained... Equilibrium achievable in one generation

3 Equilibrium Parental genotype AA Aa aa p 2 2pq q 2 Type of Mating Frequency of Mating Offspring AA Offspring Aa Offspring aa AAxAA p 4 p 4 AAxAa 4p 3 q 2p 3 q 2p 3 q AaxAa 4p 2 q 2 p 2 q 2 2p 2 q 2 p 2 q 2 AAxaa 2p 2 q 2 2p 2 q 2 Aaxaa 4pq 3 2pq 3 2pq 3 aaxaa q 4 q 4 Total 1 p 2 2pq q 2 p 4 + 2p 3 q + p 2 q 2 = p 2 (p 2 + 2pq + q 2 ) = p 2 (p + q) 2 = p 2

4 Achieved in one generation AA Aa aa h d r A a AA Aa aa h d r A a h+1/2d=p r+1/2d=q h+1/2d=p pp qp r+1/2d=q pq qq

5 Selection Hypotheses: infinite populations; no inbreeding; random mating; no mutation; no migration; selection acting on one gene alone; Statement : two alleles, A and a, with frequency p and 1 p = q at the current generation; each genotype has a fitness y, w, z 1; Genotype Freq. b. selection Relative fitness Selection Freq. a. selection AA p 2 y p 2 y p 2 y/t Aa 2pq w 2pqw 2pqw/T aa q 2 z q 2 z q 2 z/t Total 1 T 1 Frequency of gamete A = FaS(AA) + 1/2FaS(Aa) FaS(AA) + FaS(Aa) + FaS(aa) = p 2 y + pqw p 2 y + 2pqw + q 2 z = p

6 Examples The frequency of an allele at time t, p t, is function of the p t 1 and of the fitness parameters: p t = f(p t 1, y, w, z) A disease allele; a wildetype; s r 1 Genotype Fitness Dominant Recessive Heteroz. Advantage AA y s s s Aa w s 1 1 aa z 1 1 r if dominant and s = 0 p t = p2 t 1 0 + p t 1(1 p t 1 ) 0 Tot = 0 = disappears immediately

7 Effect of different selection pressures s =.8, r =.9 Dominant, Recessive, Heterozygous advantage. A frequency 0.0 0.1 0.2 0.3 0.4 0.5 0 50 100 150 200 250 300 Generations

8 Mutation all the dominant fully penetrant, lethal, early-onset disorders are due to new mutations. In general, if there is a constant frequency of one allele, on which selection acts, there must be an equilibrium between mutation and selection p due to mutation = µ= p due to selection from equation above and some approximation you get formulas like the one in your book.

9 Premises: no inbreeding; random mating; no mutation; no migration; no selection; population of N individuals. Alleles at generation t + 1 are a sample with replacement of the alleles at generation t; Statement: frequency of alleles at each generation is random (as the number of male in the children of one family is random) Drift: finite populations Probability of the frequency 0.00 0.05 0.10 0.15 Distribution of frequency for N=10,p=.5 0.0 0.2 0.4 0.6 0.8 1.0 Frequency at next generation

10 Distribution of p t p t = X t /(2N), with X t number of allele of type A at generation t. X t has a binomial distribution Prob(X t+1 = j X t = i) = 2N j ( i 2N ) j ( 1 i ) 2N j = p ij 2N E(X t+1 X t = i) = i 2N 2N ( ) Xt+1 Var 2N X t = i = i 2N ( 1 i ) 1 2N 2N if N is very big, you do not see much effect of randomness; 0 and 1 are special values: if p t ever becomes either 0 or 1, it cannot change from those values any more: FIXATION

11 Probability of Fixation π i =Prob(X t reaches 2N before 0 X 0 = i) π 0 = 0, π 2N = 1 π i = 2N j=0 p ij π j where p ij is the probability with which X 1 = j X 0 = i π i = i 2N satisfies recursive relation (expression of the mean of a binomial distribution) = The probability with which one allele gets fixed is equal to its initial frequency. = If you consider the sum of probability of fixation of the two alleles (A and a), you get that fixation happens for sure.

12 How fast does fixation occur? h(t) := probability that two alleles chosen at random are different in generation t ( h(t) = 1 1 ) 2N h(t 1) you do not choose the same gene the two genes you choose are not the same h(t) = ( 1 1 ) t h(0) 2N lim t = 0 = the convergence is slower, the closer = the convergence is slower, the larger N. ( 1 1 ) 2N is to 1.

13 Population size = 10 A frequency 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100 Generations

14 Population size = 50 A frequency 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100 Generations

15 Allele frequencies in different populations F indicates frequency (probability) of homozygous individuals We consider B populations (of the same size) b = 1,..., B. We consider a marker with i = 1,..., K alleles Population Allele 1 Allele 2 Allele K F K Pop 1 p 11 p 21 p K1 F 1 = Pop 2 p 12 p 22 p K2 F 2 =.. Pop B p 1B p 2B p KB F B = Tot p 1 = 1 B B p 1b p 2 p K F T = i=1 K p 2 i1 p 2 i2 i=1 K p 2 ib i=1 K b=1 i=1 p 2 i F S = B b=1 F b B

16 A relation between homozygosities F S F T = 1 B K F b p 2 i B = 1 B b=1 i=1 ( ) K 1 B = p 2 ib B p2 i = = = = = i=1 K i=1 1 B [ K 1 i=1 B [ K 1 i=1 B b=1 B b=1 B b=1 B b=1 ( p 2 ib ) p2 i B b=1 i=1 K K p 2 ib ( K 1 i=1 B i=1 p 2 i B p 2 ib 1 B b=1 ( p 2 ib + p2 i 2p ip ib ) + 1 B (p ib p i ) 2 + 2p i B K V ar(p i ) + 0 > 0 i=1 B b=1 p 2 i ) B ( 2pi p ib 2p 2 ) ] i b=1 ] B (p ib p i ) b=1

17 Measure of stratification F st = F S F T 1 F T = H T H S H T If all populations have the same allele frequencies F st = 0 Can be used to define a distance between population Note that if you have only one allele, F = 1 and H = 0 = Heterozygosity is a measure of diversity.

Some empirical examples 18

19

20

Part II: Linkage disequilibrium 21

22 Gametic phase equilibrium Premises: no inbreeding; random mating; no mutation; no migration; no selection; no linkage between the two loci,... Statement: Alleles distribution Marker 1 A1 A 1 A r (A 1, B 3 ) Haplotype p 1 p r B 1 B c Fr(A i B j ) = p i γ j Marker 2 B3 γ 1 γ c What if the loci are linked?

23 The family scale (Meiosis) A1 A2 A1 A2 B2 B3 B3 B2 two chromosomes cross-over cross-over gametes The distance between loci influences the probability of cross-over (map functions are mathematical models for this) The closer the loci, the less independent the alleles we inherit from one parent Within a family, there is no gametic phase equilibrium for linked loci. Allways linkage disequilibrium.

24 The population scale chromosome of one generation are either chromosomes of the previous one or recombined version of them θ recombination fraction between locus 1 and 2 two alleles at each locus A a and B b Haplotype distribution at generation t B b A πab t πab t p a πab t πab t 1 p γ 1 γ if equilibrium D t = 0 at next generation: = π t+1 AB B A pγ + D t p(1 γ) D t p a (1 p)γ D t (1 p)(1 γ) + D t 1 p γ = πt AB (1 θ) + pγθ b 1 γ if π t AB = pγ, then πt+1 AB = πt AB = equilibrium.

25 Disequilibrium over time π t+1 AB = πt AB (1 θ) + pγθ π t+1 AB pγ = πt AB (1 θ) + pγ(θ 1) D t+1 = (1 θ)(π t AB pγ) D t+1 = D t (1 θ) D t+1 = D 0 (1 θ) t+1 equilibrium is not achieved in one generation disequilibrium between two markers decreases with number of generations disequilibrium between two markers decreases faster the higher recombination between the markers

26 Decay of disequilibrium as a function of distance between markers and generations Disequilibrium as D 0.0 0.1 0.2 0.3 0.4 0.5 t=0,10,20,100,200 0.00 0.02 0.04 0.06 0.08 0.10 Distance between markers in Morgans

27 The Statistics of Linkage Disequilibrium So far, we did some abstract mathematical modeling and looked at data pictures. How do we create such pictures and how do we interpret them? There are two aspects of statistical inference: 1. Exploratory analysis = how to measure LD? 2. Confirmatory analysis = how to be sure that the LD we observe is not due to mere random chance? We are going to go into some details about how to do this when one has (a) a random sample of haplotypes (alleles at two markers, phase known) and we look for LD between the two markers (b) when we are interested in LD between one marker and a disease locus and one has a random sample of the marker alleles for a group of disease bearing and non-disease bearing chromosomes. (Association mapping)

28 Linkage disequilibrium Table view Marker A with alleles A 1,..., A r and marker B with alleles B 1,..., B c. π π Equil B 1 B 2 B c A 1 π 11 π 12 π 1c p 1 A 2 π 21 π 22 π 2c p 2........ A r π r1 π r2 π rc p r q 1 q 2 q c 1 B 1 B 2 B c A 1 p 1 q 1 p 1 q 2 p 1 q c p 1 A 2 p 2 q 1 p 2 q 2 p 2 q c p 2........ A r p r q 1 p r q 2 p r q c p r q 1 q 2 q c 1 where π ij is the population frequency of haplotype A i B j.

29 Data type In reality, we never know π ij. We observe a random sample of n haplotypes, and we have the counts n ij of how many of these are of the type A i B j. B 1 B 2 B c A 1 n 11 n 12 n 1c n 1 {n ij } =........ A r n r1 n r2 n rc n r n 1 n 2 n c n = Any measure or test has to be based on this table ({n ij } = To get the corresponding table under equilibrium, we fix the marginal counts: {n Equil } = n i n j /n the larger the distance between the tables, the higher the disequilibrium the larger the distance between the tables, the stronger the evidence against random pattern

30 Measures and tests LD One common distance between {n ij } and {n Equil } Chi({n ij }, {n Equil }) = ij (n ij n i n j /n) 2 n i n j /n Measure: we want to standardize the distance, so that it is 0 if the table π is in equilibrium and 1 if there is maximum disequilibrium = define what is maximum disequilibrium Test: we want to evaluate probability of observing a distance as large as the recorded one, when the markers are in equilibrium.

31 Tests of LD The focus is now on There are two ways of evaluating this: Prob(Chi({n ij }, {n Equil }) > c linkage equilibrium) 1. Asymptotic approximation: Chi({n ij }, {n Equil }) χ 2 (r 1)(c 1), has a χ2 square distribution with (r 1)(c 1) degrees of freedom, when n large. 2. Exact: we can evaluate the probability of observing the table {n ij } given the marginal counts and the hypothesis of linkage equilibrium and go from there. Fisher exact test. Coded in Mendel.

32 The χ 2 test Marker D9S63 and DYT1. Dis Nor all. 16 33 36 69 others 3 270 273 36 306 342 Test value = (33 69 36/342)2 69 36/342 (3 273 36/342)2 + 273 36/342 = 127 + + (36 69 306/342)2 69 306/342 (270 273 306/342)2 273 306/342 + = Now we use a χ 2 1 table to evaluate the probability that we get such a test score or a bigger one under the hypothesis of equilibrium (p-value) The p-value is less than 2.2e-16. (note that this is a bit different from what they report in the paper, as they do a continuity correction).

33 The data and its probability table = B 1 B 2 B c A 1 n 11 n 12 n 1c n 1 A 2 n 21 n 22 n 2c n 2........ A r n r1 n r2 n rc n r n 1 n 2 n c n Condition on observed allele frequency Assume Linkage Equilibrium For each table, we can evaluate its probability under the null hypothesis (it s called Fisher-Yates distribution) Pr(table n i, n j, LE)

34 Permutation description of null hypothesis To generate a table from the null distribution I can use permutations: Marker 1 Marker 2 Haplotype 1 A 4 B 4 Haplotype 2 A 1 B 2 Haplotype 3 A 3 B 1 Haplotype 4 A 2 B 3 Haplotype 5 A 1 B 2... Haplotype n A 2 B 1 Permut Col 1 Permut Col 2

35 Fisher Exact Test of Independence P-value: sum of the probabilities of all the tables that have a probability smaller than the one of the observed one. P-value via permutations P-value = #Permutations : Pr(permut) Pr(obser) #Permutations = it is not based on asymptotic approximations (as a χ 2 test would be) good for sparse tables. = we can estimate the p-value with a random sample of permutations. The case of LD and disease can be treated similarly (see options 11 and 12 in Mendel). On our data, the result does not change: p-value 2.2e-16

36 Maximum Disequilibrium for 2x2 tables Let D = n 11 /n n 1 n 1 /n 2. Then, {n ij }/n = B 1 B 2 A 1 n 1 n 1 /n 2 + D n 1 (n n 1 )/n 2 D n 1 /n A 2 (n n 1 )n 1 /n 2 D (n n 1 )(1 n 1 )/n 2 + D n n 1 /n n 1 /n n n 1 /n n And Chi = nd 2 n 1 (n n 1 )n 1 (n n 1 )/n 2 = D2 f(n 1, n 1, n) We want maximum value of D given the marginal counts n 1, n 1 (notice that given the marginal counts the denominator of Chi is constant).

37 B B 1 2 A A 1 2???? A always with B 1 A always with B 2 MAXIMUM DISEQ 1 2 and viceversa and viceversa B B 1 2 A A 1 2???? A always with B and viceversa 1 2 A always with B and viceversa 2 1

38 Now, consider the case where the marginal counts are not identical B B 1 2 A A 1 2???? B always with A 1 1 A always with B 2 2

39 We have two directions of maximal association (D has more information than D 2 ) If the marginal counts are equal, only two haplotypes will have positive frequency in the maximal association; otherwise three. the following formula gives a standardization of D that incorporates this way of evaluating the maximal association D if D 0 min(n 1 (n n 1 )/n 2, n 1 (n n 1 )/n 2 ) D =. D if D < 0 min(n 1 n 1 /n 2, (n n 1 )(n n 1 )/n 2 )

40 Example Marker D9S63 and DYT1. Dis Nor all. 16 33 36 69 others 3 270 273 36 306 342 D = (33/342 69 36/342 2 ) = 0.075 D = 0.075 min(69 306/342 2, 36 273/342 2 ) = 0.75 0.084 = 0.891