Lecture 18 - Selection and Tests of Neutrality. Gibson and Muse, chapter 5 Nei and Kumar, chapter 12.6 p Hartl, chapter 3, p.

Similar documents
122 9 NEUTRALITY TESTS

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

Neutral Theory of Molecular Evolution

7. Tests for selection

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Classical Selection, Balancing Selection, and Neutral Mutations

Statistical Tests for Detecting Positive Selection by Utilizing High. Frequency SNPs

Population Genetics I. Bio

Statistical Tests for Detecting Positive Selection by Utilizing. High-Frequency Variants

Fitness landscapes and seascapes

Using Molecular Data to Detect Selection: Signatures From Multiple Historical Events

Lecture Notes: BIOL2007 Molecular Evolution

Levels of genetic variation for a single gene, multiple genes or an entire genome

Using Molecular Data to Detect Selection: Signatures From Multiple Historical Events

Estimating selection on non-synonymous mutations. Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh,

Understanding relationship between homologous sequences

Statistical Tests of Neutrality of Mutations

I of a gene sampled from a randomly mating popdation,

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Australian bird data set comparison between Arlequin and other programs

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates

2. Map genetic distance between markers

The Genealogy of a Sequence Subject to Purifying Selection at Multiple Sites

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

The genomic rate of adaptive evolution

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity,

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION

Genetic Variation in Finite Populations

Notes 20 : Tests of neutrality

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

STAT 536: Genetic Statistics

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss

Probabilistic modeling and molecular phylogeny

Sequence Divergence & The Molecular Clock. Sequence Divergence

The Quantitative TDT

Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data

NUCLEOTIDE diversity (the mean amount of difference

A Population Genetics-Phylogenetics Approach to Inferring Natural Selection in Coding Sequences

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

I Have the Power in QTL linkage: single and multilocus analysis

NEUTRALITY tests are among the most widely used tools

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Population structure and demographic history of a tropical lowland rainforest tree species Shorea parvifolia (Dipterocarpaceae) from Southeastern Asia

p(d g A,g B )p(g B ), g B

BIOINFORMATICS TRIAL EXAMINATION MASTERS KT-OR

Effective population size and patterns of molecular evolution and variation

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Neutral behavior of shared polymorphism

The abundance of deleterious polymorphisms in humans

Variances of the Average Numbers of Nucleotide Substitutions Within and Between Populations

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

THE effective population size (N e ) is one of the most. Quantifying the Variation in the Effective Population Size Within a Genome INVESTIGATION

MOLECULAR population genetic approaches have

NUMEROUS molecular events determine the outcome

POPULATION GENETICS Biology 107/207L Winter 2005 Lab 5. Testing for positive Darwinian selection

Natural selection on the molecular level

Selection and Population Genetics

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

A Very Brief Summary of Statistical Inference, and Examples

Outline. P o purple % x white & white % x purple& F 1 all purple all purple. F purple, 224 white 781 purple, 263 white

Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution. Classical vs. balanced views of genome structure

Institute of Actuaries of India

Comparing Evolutionary Patterns and Variability in the Mitochondrial Control Region and Cytochrome b in Three Species of Baleen Whales

Estimating Evolutionary Trees. Phylogenetic Methods

Bioinformatics. Genotype -> Phenotype DNA. Jason H. Moore, Ph.D. GECCO 2007 Tutorial / Bioinformatics.

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

D. Incorrect! That is what a phylogenetic tree intends to depict.

Population Genetics. Tutorial. Peter Pfaffelhuber, Pleuni Pennings, and Joachim Hermisson. February 2, 2009

ms a program for generating samples under neutral models

Estimating the Distribution of Selection Coefficients from Phylogenetic Data with Applications to Mitochondrial and Viral DNA

Week 7.2 Ch 4 Microevolutionary Proceses

Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution

Inferring the Frequency Spectrum of Derived Variants to Quantify Adaptive Molecular Evolution in Protein-Coding Genes of Drosophila melanogaster

Properties of Statistical Tests of Neutrality for DNA Polymorphism Data

Lecture WS Evolutionary Genetics Part I 1

Group activities: Making animal model of human behaviors e.g. Wine preference model in mice

Population genetics of polymorphism and divergence under fluctuating selection. Cornell University, Ithaca, NY

Lecture 13: Population Structure. October 8, 2012

UNDER the neutral model, newly arising mutations fall

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Lectures 5 & 6: Hypothesis Testing

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Lecture 13: Variation Among Populations and Gene Flow. Oct 2, 2006

EXTENSIVE INTROGRESSION OF MITOCHONDRIAL DNA RELATIVE TO NUCLEAR GENES IN THE DROSOPHILA YAKUBA SPECIES GROUP

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci

Testing for spatially-divergent selection: Comparing Q ST to F ST

Genetic Association Studies in the Presence of Population Structure and Admixture

Session 3 The proportional odds model and the Mann-Whitney test

Alan R. Templeton. Manuscript received February 4, 1996 Accepted for publication July 24, 1996

Supporting Information

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Mathematical models in population genetics II

Lecture 1 Bayesian inference

Transcription:

Lecture 8 - Selection and Tests of Neutrality Gibson and Muse, chapter 5 Nei and Kumar, chapter 2.6 p. 258-264 Hartl, chapter 3, p. 22-27

The Usefulness of Theta Under evolution by genetic drift (i.e., neutral evolution), each estimator of theta is an unbiased estimator of the true value of theta: E( ˆ " W ) = E( ˆ " # ) = " Therefore, differences between estimates from each estimator can be used to infer non-neutral evolution (i.e., natural selection). Keep in mind that there are other estimators of theta as well. Each estimator, however, has the same expectation (i.e., the true value of theta)

Neutrality Tests Using Theta Given that we observe a difference between estimates of theta using different estimators, can we perform a test for selection using this information? Tajima s D (Tajima, 989) Calculate D and perform a statistical test of the form: H 0 : D = 0 (No selection) H A : D 0 (selection)

Tajima s D - The Test Statistic From the observed data, calculate the following: D = " ˆ # $ " ˆ W var( " ˆ # $ " ˆ W ) var( ˆ " # $ ˆ " W ) = n + 3(n $) $ n$ %[ i] i= n$ S + %[ i] i= 2(n 2 + n + 3) 9n(n $) [% n$ ]2 i= $ n + 2 n$ + n %[ i] i= n$ %[ i 2 ] i= [ n$ % n$ [ i] + [ i 2 % ] i= i= [ i] ]2 S(S $) If D is too negative or too positive, then H 0 is rejected. Too positive or too negative is determined through simulation of DNA sequence samples under neutral evolution.

Nucleotide diversity in loblolly pine candidate genes for drought Gene n Total Synonymous Nonsynonymous Silent (non-coding + synonymous) S θ π S θ π S θ π S θ π Tajima s D lp3-32 8 8.77 2.70 9.3 2.34 2.29 0.58 7 7.40 2.47 -.05088 lp3-3 32 3.59 0.97 0 0 0 2 2.0.42.08 0.53-0.89756 dhn- 32 3 5.08 4.5 4 8.88 7.5 3.83.72 0.30 8.8-0.59854 dhn-2 3 4 6.64 7.76 6 5.3 6.58 4 3.03 2.87 0 3.7 6.56 0.56236 mtl-like 32 9 5.55 5.0 0 0 0 2 6.77 2.50 7 5.27 5.68-0.2479 sod-chl 32 9 6.88 7.80 2 2.6 7.54 2 3.86 4. 7 7.57 8.65 0.45770 ferritin 32 7 2.92.28 0 0 0 2.07 0.52 6 3.4.48 -.63069 rd2a-2 3 26 7.02 7.68 7 3.28 9.66 5 2.84 3.53 2.40.5 0.3370 sams-2 32 6 2.75 3.60 2 6.07 9.07 0 0 0 6 5.40 7.06 0.86302 pal- 3 6 3.8 2.62 4.3 2.06.35 0.35 5 6.00 4.64-0.8854 ccoaomt- 32 3 6.68.86 4 8.07 26.56 0 0 0 3.67 9.23 2.52548* cpk3 32 8 3.6 3.55 3 9.49 3.82 0.84 0.59 7 5.26 6.22 0.36974 pp2c 32 0.39 0.0 2.20 0.55 0 0 0 0.86 0.22 -.4244 Aqua-MIP 32 5 2.06.74 2 7.03 0.50 0 0 0 5 3.03 2.55-0.429 erd3 32 6.70 0.43 0 0 0 2.04 2.26 4 2.48 0.62-2.098* ug-2_498 32 0 8.65 5.26 - - - - - - - - - -.22364 AVERAGE 0 4.6 4.79 2.2 7.09 7.06.6.87.36 9 7 7.08 * P < 0.05; ** P < 0.0 Diversity values are multiplied by 0 3 Gonzalez-Martinez et al. Genetics 2006

Tajima s D - An Example Tajima s D = 2.52 Tajima s D = -2.0 0 5 HAP T G G T T C C C C C C T C T G G T T C C C C C C T C T G G T T C C C C C C T C T G G T T C C C C C C T C HAP2 T G G T T C C C C C C T C T G G T T C C C C C C T C T G G T T C C C C C C T C T G G T T C C C C C C T C C A T A G A T T T C T A I C C A T A G A T T T C T A I C HAP3 C A T A G A T T T C T A I C C A T A G A T T T C T A I C C A T A G A T T T C T A I C HAP4 C A T A G A T T T C T A C C A T A G A T T T C T A C HAP5 T G G T T G C C C C C T C T G G T T G C C C C C T C HAP6 T G G T T C C C C C C T T HAP7 C A T A G A T T T A T A C 6 5 7 7 3 2 2 2 3 2 6 2 7 9 3 3 3 7 0 3 7 4 3 9 8 4 6 5 5 3 3 5 5 5 7 8 0 9 6 6 7 6 5 8 4 3 9 8 5 6 HAP A G G I T G C HAP2 A G G I T G C A G G I T G C A G G I T G C HAP3 A A T T C C HAP4 A G G C G C HAP5 A G G I T G T HAP6 G G G T G C INDELS INDELS I TGAGGAAACAGGGGTG I ACTT ccoaomt-, excess of high frequency variants. Balancing selection? erd3, excess of rare variants. Genetic hitchhiking?

Other Neutrality Tests Using Theta D* = Fu and Li s D* " ˆ W #" ˆ $ var( " ˆ % #" ˆ $ ) H norm = Zeng et al. s H " ˆ # $ " ˆ L var( " ˆ # $ " ˆ L ) Both of these tests require an outgroup for their estimation.

Pitfalls of D, D*, and H norm as Tests for Selection Moderate to low statistical power (i.e., the null hypothesis is not rejected when in reality it should be). Sensitive to the demographic history of the populations from which the samples were taken. Population expansions tend to produce significantly negative values of D, while population bottlenecks tend to produce significantly positive values of D. Moreover, any process that affects the shape and depth of the gene tree will affect these statistics.

Neutrality Tests - A Different Approach The previous approach was susceptible to any process that affected the genealogy for a sample of gene sequences. If a test uses information that is standardized by the genealogy, then this may not a problem.

Expectations from Genealogies Comparisons of within to between species polymorphisms, so-called polymorphism-divergence tests, can remove the dependency upon the genealogy for tests of selection.

The Hudson, Kreitman, and Agaude (HKA) Test The HKA test uses data from two different species. Under neutral evolution, the expected number of differences (D) between two homologous sequences (one from each species) is given by: E(D) = k"(t +) var(d) = k"(t +)+ k 2 " 2 Similarly, the expected number of polymorphisms within each species are given by: E(S) = " ˆ W a n var(s) = " ˆ W a n + b n (" ˆ W ) 2 The rationale for the HKA test is that under neutral evolution, the diversity within each species depends upon theta, but that the divergence between species depends on theta AND time (T).

The HKA Test Statistic The equations from the last slide can be used to construct a X 2 statistic with 2L-2 degrees of freedom (L = number of loci, L must be 2): [S # E(S)] 2 " 2 = $ $ i j var(s) If X 2 is large enough, the null hypothesis of neutral evolution can be rejected. (If E(S) and E(D) are estimated from the same data as that being used in the test, X 2 must be bigger than 3.84 for L = 2 using a critical p-value of 0.05 ). + [D # E(D)]2 $ i var(d)

Interpretations of HKA Results A significant HKA result can be due to a number of causes: Too much polymorphism within species at synonymous sites. This is consistent with purifying selection. Too much polymorphism between species at nonsynonymous sites This is consistent with positive directional selection. The HKA is also conservative because it assumes complete linkage within each gene and free recombination between genes.

The MacDonald-Kreitman (MK) Test The MK test removes effects of the genealogy by dividing the types of polymorphisms at a given locus into four types: Synonymous/Polymorphic Synonymous/Fixed Nonsynonymous/Polymorphic Nonsynonymous/Fixed

The MK Table and Tests for Independence Fixed Polymorphic Non-synonymous N F N P Synonymous S F S P Ratio N F / S F = N P /S P Under neutrality, N F / S F = N P / S P Under positive directional selection, N F / S F > N P / S P A likelihood ratio test for independence in a contingency table is then used to test the null hypothesis of neutrality. If we assume that S F = S P under neutrality, the test statistic is given as: all loci " [N G = 2 [N F N P ]ln F N P ] % ( $ ' # avg[n F N P ]&

dn/ds A related concept to the MK test are tests based on the dn/ds ratio. These tests assume that under neutrality, dn/ds =.0 Statistical tests are then constructed to test whether or not an observed dn/ds deviates from