Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Similar documents
Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Population Genetics I. Bio

Processes of Evolution

Classical Selection, Balancing Selection, and Neutral Mutations

Neutral Theory of Molecular Evolution

19. Genetic Drift. The biological context. There are four basic consequences of genetic drift:

NOTES CH 17 Evolution of. Populations

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

I. Short Answer Questions DO ALL QUESTIONS

Recombina*on and Linkage Disequilibrium (LD)

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50

7. Tests for selection

Understanding relationship between homologous sequences

Breeding Values and Inbreeding. Breeding Values and Inbreeding

List the five conditions that can disturb genetic equilibrium in a population.(10)

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

CHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection

Big Idea #1: The process of evolution drives the diversity and unity of life

It all depends on barriers that prevent members of two species from producing viable, fertile hybrids.

Notes on Population Genetics

Population Structure

The neutral theory of molecular evolution

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

Inbreeding depression due to stabilizing selection on a quantitative character. Emmanuelle Porcher & Russell Lande

Quantitative Genetics I: Traits controlled my many loci. Quantitative Genetics: Traits controlled my many loci

The Wright-Fisher Model and Genetic Drift

Evolution. Before You Read. Read to Learn

AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity,

Laboratory III Quantitative Genetics

Since we re not going to have review this week either

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Mechanisms of Evolution Microevolution. Key Concepts. Population Genetics

Effective population size and patterns of molecular evolution and variation

Evolution. Species Changing over time

Problems for 3505 (2011)

D. Incorrect! That is what a phylogenetic tree intends to depict.

Chapter 16. Table of Contents. Section 1 Genetic Equilibrium. Section 2 Disruption of Genetic Equilibrium. Section 3 Formation of Species

Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution. Classical vs. balanced views of genome structure

Gene Pool Genetic Drift Geographic Isolation Fitness Hardy-Weinberg Equilibrium Natural Selection

There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page.

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Selection and Population Genetics

Introduction to population genetics & evolution

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

(Write your name on every page. One point will be deducted for every page without your name!)

How robust are the predictions of the W-F Model?

1. What is genetics and who was Gregor Mendel? 2. How are traits passed from one generation to the next?

Introduction to Advanced Population Genetics

- point mutations in most non-coding DNA sites likely are likely neutral in their phenotypic effects.

Quantitative Trait Variation

Genetic Variation in Finite Populations

NOTES Ch 17: Genes and. Variation

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Chapter 17: Population Genetics and Speciation

Microevolution 2 mutation & migration

Exam 1 PBG430/

Solutions to Problem Set 4

Genetical theory of natural selection

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Microevolution Changing Allele Frequencies

Lecture 13: Variation Among Populations and Gene Flow. Oct 2, 2006

Evolution PCB4674 Midterm exam2 Mar

Population genetics snippets for genepop

Population Genetics & Evolution

Evolution. Species Changing over time

Mutation, Selection, Gene Flow, Genetic Drift, and Nonrandom Mating Results in Evolution

Evolu&on, Popula&on Gene&cs, and Natural Selec&on Computa.onal Genomics Seyoung Kim

The theory of evolution continues to be refined as scientists learn new information.

Evolution of Populations. Chapter 17

STAT 536: Migration. Karin S. Dorman. October 3, Department of Statistics Iowa State University

Model Building: Selected Case Studies

Evolutionary Genetics Midterm 2008

UNIT V. Chapter 11 Evolution of Populations. Pre-AP Biology

Lecture 13: Population Structure. October 8, 2012

F SR = (H R H S)/H R. Frequency of A Frequency of a Population Population

Demography April 10, 2015

- mutations can occur at different levels from single nucleotide positions in DNA to entire genomes.

Theory a well supported testable explanation of phenomenon occurring in the natural world.

Gene Pool The combined genetic material for all the members of a population. (all the genes in a population)

Reproduction and Evolution Practice Exam

Evidence of Evolution

Case Studies in Ecology and Evolution

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

A. Correct! Genetically a female is XX, and has 22 pairs of autosomes.

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Lecture WS Evolutionary Genetics Part I 1

Intraspecific gene genealogies: trees grafting into networks

Chapter 5 Evolution of Biodiversity. Sunday, October 1, 17

1 Errors in mitosis and meiosis can result in chromosomal abnormalities.

Mechanisms of Evolution

8. Genetic Diversity

Lecture 24: Multivariate Response: Changes in G. Bruce Walsh lecture notes Synbreed course version 10 July 2013

STUDY GUIDE SECTION 16-1 Genetic Equilibrium

2. Map genetic distance between markers

EVOLUTION UNIT. 3. Unlike his predecessors, Darwin proposed a mechanism by which evolution could occur called.

Genetics and Natural Selection

URN MODELS: the Ewens Sampling Lemma

122 9 NEUTRALITY TESTS

Transcription:

OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall that by genetic diversity (or polymorphism or heterozygosity ) we typically mean Π. Thus, it matters not only how many segregating sites we have, but also whether the alleles at those sites are more often at intermediate frequencies (which inflate the number of pairwise differences) or at rare/high frequencies (recall that if one allele is rare the other must be common, assuming a biallelic locus). Background selection means that deleterious mutations and nearby linked variants are removed from the population as they arise. A large rate of recombination will more easily allow neutral linked variants to detach themselves from deleterious variants and so persist in the population, whereas a small rate of recombination ensures that neutral variants are often removed along with the targets of background selection proper. More formally, Charlesworth showed that we can calculate the effective population size by scaling N by a factor of e (-µ/r), where µ and r are the mutation and recombination rate, respectively. Thus recombination rate is positively correlated with Ne, which in turn is positively correlated with per-site diversity (recall that the infinite sites model predicts that E(Π) = θ = 4N e µ). In other words, big r à big N e à big θ à big Π. Genetic hitchhiking means that beneficial mutations and nearby linked variants proliferate through a population together, causing a local reduction in diversity surrounding the adaptive allele. This process is counteracted by recombination, which breaks up haplotypes and restores diversity to the population. When the recombination rate is high, diversity is restored quickly, whereas when it is low haplotypes can persist for long stretches of time, preserving a low value of Π. Q2) What is the inbreeding coefficient of individual I in the pedigree below, assuming that the only nonzero values of F among the ancestors of I are F A = 1/16 and F D = 1/8. You should leave your answer in the form of an unreduced expression.

Calculate F by tracing all possible paths through common ancestors: KGCADJL KHDACGL KHEBFJL KHDJL KGL F I = 2 *(1/ 2) 7 (1+ F A )+ (1/ 2) 7 (1+ F B )+ (1/ 2) 5 (1+ F D )+ (1/ 2) 3 (1+ F G ) = 2 *(1/ 2) 7 (1+ 1 16 )+ (1/ 2)7 + (1/ 2) 5 (1+ 1 )+ (1/ 2)3 8 = 0.185 Q3) A population at steady state in the infinite-alleles neutral model has a homozygosity equal to 10%. What value of θ can you infer? With random mating, how many equally frequent alleles would be required to produce the same level of homozygosity? At steady-state, F = 1/(1+θ), so θ = (1 F)/F = 9. With n equally frequent alleles, each! has frequency p = 1/n, giving us homozygosity!""!"#$ p! =! =! (think of, for example,!!! a Punnett square with n possibilities: each square is equally probable with p=(1/2) 2, and there are n homozygotes). Hence, N = 10. Q3) An agronomist is studying grain yield in an outbred variety of maize. The variety has a mean yield of 200 bushels per acre. The table below shows estimates of the additive genetic variance V A, the dominance variance V D, the environmental variance V E, and the total phenotypic variance, each expressed either as its estimated value in [bu/acre] 2 or as its estimated value as a fraction of the total phenotypic variance V P. Variance component Estimated value Value as fraction of V P V A 300 [bu/acre] 2 V D 14.3% V E 300 [bu/acre] 2 V P 100.0% A. Complete the missing entries in the table. [You may round the value of each variance component to the nearest 100.] Let x = V P. Then the values given imply that 300 + 0.143x + 300 = x, hence x = 600/0.857 = 700 [bu/acre] 2. With this as the value of V P, the rest of the entries in the table are as follows. (The fact that the numbers in the percent column add to 99.9% not 100.0% is due to round-off error.) Variance component Estimated value Value as fraction of V P V A 300 [bu/acre] 2 42.8% V D 100 [bu/acre] 2 14.3% V E 300 [bu/acre] 2 42.8%

V P 700 [bu/acre] 2 100.0% B. What are the values of the narrow-sense heritability h 2 and of the broad-sense heritability H 2. [In estimating the broad-sense heritability, ignore any possible effects of interaction between different genes.] Recall that h 2 = V A /V P and that H 2 = V G /V P : h 2 = 300 700 = 0.428 and H 300 + 100 2 = = 0.571 700 C. As noted, the mean yield of the variety is 200 bu/acre. If the top yielding 20% of the plants are selected for breeding, and mated randomly among themselves, this is equivalent to a selection differential S of S = 37 bu/acre. What is the expected mean yield of the progeny of the selected parents? Use the breeder s equation, R =h 2 S, where h 2 =.428 and S=37, hence R = 15.84 Because R = M M, where M = progeny mean and M = population mean, we can infer M = 200 + 15.84 = 215.84 bu/acre. Q4) Consider the two following phylogenetic topologies. If you were to calculate Tajima s D for each of them, what do you expect your results would be, and how would you interpret that? What if you were to use a McDonald-Kreitman test? When would it be appropriate to apply one or the other? The left tree has relatively deep /ancient coalescent times, whereas the tree on the right has relatively shallow /recent coalescent times. We would expect Tajima s D to be negative in the former case. Most mutations that we sprinkle onto this tree will happen on private branches, and so will be rare. Rare alleles contribute less to per-site heterozygosity than do intermediate frequency alleles (consider how many pairwise differences: AAAC vs AACC?), so we will deflate θ Π relative to θ S for an overall negative statistic. This might suggest directional selection or population growth, for example. (Negative selection against deleterious alleles will reduce frequencies, and positive selection can also lead to a surplus of rare alleles as mutations appear on the homogenous background produced by a selective sweep. To

see why this topology is consistent with population growth, recall the relationship between population size and coalescent times predicted by the Kingman coalescent). We would expect Tajima s D to be positive in the latter case. Most mutations that we sprinkle onto this tree will be shared, and so will be common. By the above reasoning, this will inflate θ Π relative to θ S for an overall positive statistic. This might suggest balancing selection or admixture, for example. (Balancing selection will preserve polymorphisms at intermediate frequencies against the effects of drift, and admixture will have an overall averaging effect on allele frequencies between the two populations.) Tajima s D is often applied broadly, as it assumes only the infinite sites model, and people are generally willing to make this assumption across a broad range of time scales. However, arguably this model begins to lose validity when our tree spans long evolutionary times (e.g. spanning speciation events), at which point multiple substitutions at a site become feasible hence, not every mutation happens at a new site. The McDonald-Kreitman test, on the other hand, assumes that we can partition our data into polymorphism and divergence, where the former refers to variation within a population of some species and the former generally refers to variation between two species. Thus, we would generally only want to use this test if the root of the trees pictured above represents a speciation event. Moreover, we could only use this test if we are examining coding regions, because we need to be able to compare synonymous and non-synonymous changes, whereas Tajima s D can be applied to any genomic region. If we applied the MKT to a topology like the one on the left, we would expect to find that polymorphism exceeds divergence (again, mutations sprinkled on the tree will create differences among individuals on the left branch). This could be indicative of purifying selection between the species (keeps divergence low) or balancing selection within the population (keeps polymorphism high). If we applied the MKT to a topology like the one on the right, we would expect to find that divergence exceeds polymorphism. This could be indicative of positive selection between the species (accelerates the accumulation of differences between them). Q5) A geneticist is studying the hierarchical population structure of a species of ground squirrel in an area where there is a confluence of two wide streams to form a river. To determine whether the watercourses are significant barriers to gene flow, the researcher estimates allele frequencies of a biallelic gene from large samples of individuals from three subpopulations in each region. A diagram of the area and the allele frequencies in the subpopulations are shown below.

A. Estimate H S, H R, and H T for the subpopulations, regional populations, and total area. Recall that when we look at the different levels of structure (subpops, regions, total) we are changing the granularity at which we define our allele frequencies, which we then use to calculate heterozygosity according to Hardy-Weinberg. In each the case of subpops or regions, we then take an average. (In the case of the total population, doing so would be trivial.) 9 2(0.1* i)(1 0.1* i) i=1 H S = 9 = 0.36667 2(0.4)(.6)+ 2(0.5)(0.5)+ 2(0.6)(0.4) H R = = 0.4867 3 H T = 2(0.5)(0.5) = 0.5 B. Estimate F SR, F RT, and F ST for these populations. Recall that F XY = [H Y H X ]/H Y. In other words, F XY is the reduction in heterozygosity relative to Y, due to structure at the X level. F SR = 0.246 F RT = 0.027 F ST = 0.267 C. Based on these estimates, do the watercourses appear to be a significant impediment to gene flow? (Please answer with either "Yes" or "No.") No, because F RT is smaller than F SR. Thus, the reduction in heterozygosity due to population structure at the level of regions is not as great as the reduction in heterozygosity due to structure at the level of subpops within those regions. In other words, most of the population structure appears at the subpop level, rather than the regional (watercourse-defined) level.

Q6) The equation d(t) = 19 20 (1 e 40αt ) gives the Jukes-Cantor-corrected proportion of amino acid differences between two aligned protein sequences from different species that diverged from a common ancestral species that existed t years ago. The rate of amino acid replacement in each lineage is given by 20 α. Orthologous protein molecules were compared in two pairs of species. One species pair had diverged twice as long ago as the other species pair. In the more divergent species, the observed percentage of amino acid differences in the protein was 91.1%, whereas in the more recently diverged species pair the observed percentage of amino acid differences in the protein was 52.3%. A. Are these data consistent with a molecular clock? Letting t = τ equal the time of divergence of the less divergent species pair, the question states that the time of divergence of the more divergent species pair is t = 2τ. The equation for d(t) implies that ln[1 20d(t)/19] = 40αt. Hence 40ατ = ln[1 (20)(0.523)/19] = 0.040 or 20ατ = 0.040 in the less divergent species pair. In the more divergent species pair, 40α(2τ) = ln[1 (20)(0.911)/19] = 3.193 or 20ατ = 0.080. The rates of amino acid replacement (20α) are therefore 0.04/τ and 0.08/τ in the two comparisons, which is not consistent with a molecular clock. B. From these data, can one estimate the absolute rate of amino acid replacement in each lineage? No, the percent differences depend on the product ατ, and since neither is known, neither can be specified. A faster rate would result in the same percent differences in a shorter time, and a slower rate would result in the same percent differences in a longer time. C. From these data, can one estimate the relative rate of amino acid replacement in each lineage? Yes, from these data we can say that the rate of amino acid replacement (20α) in the more divergent species pair, relative to that in the less divergent species pair, is greater by a factor of (0.08/τ)/(0.04/τ) = 2. Q7) You are examining a species of flower that is normally blue. Occasionally plants with red flowers are observed in wild populations. You determine that flower color is controlled at a single locus, with the red allele completely recessive to the blue allele. You conduct a survey in a field and find 3000 blue flowers and 500 red flowers. You then look at the mean number of seed pods produced by the flowers, and find that the blue plants on average produce 20 pods whereas the red flowers on average produce 15. Assuming that the alleles are currently in HWE, but that selection is operating, predict the genotype frequencies after another generation. Assume that seed pod count is a perfect proxy for fitness (e.g. all seeds produced successfully take root, etc.) If the blue

allele mutates to a red allele at the rate of 10-5 /gen, what will the equilibrium frequency of the red allele be at mutation-selection balance? We first need to calculate the relative fitness of each genotype. We can let B represent the dominant (blue) allele and b represent the recessive allele. In this case, our relative fitnesses are as follows: w BB = 1; w Bb = 1; w bb = 15/20 =.75 We can next calculate mean fitness by assuming HWE. The frequency of the bb genotype is 500/3500, suggesting that q =.378 and p =.622. Our mean fitness is p 2 (w BB ) + 2pq(w Bb ) + q 2 (w bb ) = (.387)(1) + (.4702)(1) + (.1429)(.75) =.964 We can now divide each term in the above sum by wbar to get the predictions for genotype frequencies: w BB =.401; w Bb =.488; w bb =.111 To find the equilibrium frequency, we can use the formula q =!!, which holds when the harmful allele is a complete recessive (h=0). We now need to find s. Since w bb = 1-s =.75, we can infer that s =.25. Our equilibrium frequency =!"!!.!" =.006.