Detecting selection from differentiation between populations: the FLK and hapflk approach.

Size: px
Start display at page:

Download "Detecting selection from differentiation between populations: the FLK and hapflk approach."

Transcription

1 Detecting selection from differentiation between populations: the FLK and hapflk approach. Bertrand Servin Maria-Ines Fariello, Simon Boitard, Claude Chevalet, Magali SanCristobal, Maxime Bonhomme INRA Animal Genetics Toulouse, France June 18, 2013

2 Introduction We will be considering a set of populations differentiated through the effect of drift As selection modifies allele frequencies within a population it amplifies differentiation between populations at selected loci Differentiation-based tests for selection are characterized by: their models for background differentiation ( neutral demographic model ) how they capture outliers / model selective effects FLK (and hapflk): Pure drift model, with population splits (tree demography, phylogeny ) Outlier approach: look for genome regions where the neutral (null) model does not fit well (goodness of fit statistic)

3 Outline 1 Theoretical Background Neutral model for SNP data in multiple populations Single SNP statistics for detecting selection: FLK Incorporating haplotype information: hapflk 2 Genome Scanning with hapflk

4 Outline 1 Theoretical Background Neutral model for SNP data in multiple populations Single SNP statistics for detecting selection: FLK Incorporating haplotype information: hapflk 2 Genome Scanning with hapflk

5 Outline 1 Theoretical Background Neutral model for SNP data in multiple populations Single SNP statistics for detecting selection: FLK Incorporating haplotype information: hapflk 2 Genome Scanning with hapflk

6 Single Population: evolution of allele frequency under drift Consider a biallelic locus (SNP) in a population evolving under pure drift Starting at a frequency p 0 Let F t be the fixation index of the population after t generations. (F t = 1 (1 1 2N )t t/2n) Provided F t is small, we can model: See Nicholson et al. (2002). p(t) N (p0, F t p0(1 p0))

7 Single Population: evolution of allele frequency under drift Simulated trajectories Normal approximation

8 Multiple populations: star-like evolution Consider an ancestral population split at time t 0 in multiple populations, evolving in parallel, i.e. star-like population tree. Assume no mutation after the split (F t is small), for each population: p i N (p 0, F i p 0 (1 p 0 )) NB: if we were to assume the same F i for each population, then F i = F ST.

9 Multiple populations: tree-like evolution Under the star-like model, conditional on p 0, all populations are independent, (Cov(p i, p j ) = 0) If we allow Cov(p i, p j ) 0 : a population tree ( F 3 = 1 f 12 = 1 Kinship matrix 1 1 2N 3 ) t ( 1 1 2N 12 ) t12 Var(p i ) = F i p 0 (1 p 0 ) Cov(p i, p j ) = f ij p 0 (1 p 0 ) F = F 1 f 12 0 f 12 F F 3 Var(p) = Fp 0 (1 p 0 )

10 Estimation of the neutral model This evolutionary model (population tree, pure drift) has two parameters : p 0 and F Suppose F is known, then a natural estimator of p 0 is the generalized least squares estimator: ˆp 0 = 1T F 1 p 1 T F 1 1 Estimating F means reconstructing the population tree, with branch length unit expressed in terms of fixation indices.

11 Estimation of the population kinship matrix Branch length of the tree are measured in units of drift ( t/2n) For each pair of population the Reynolds genetic distance D (Reynolds, Weir and Cockerham, 1983) between two populations i and j has expectation: see Laval et al. (2002). E(D ij ) = F i + F j 2 The population tree can be built using the neighbour joining algorithm on the Reynolds distances matrix 2, computed over many ( 10 4 ) SNPs. Assumes majority of them are neutral. Rooting the tree requires an outgroup. If not, uses midpoint rooting.

12 Conclusions on the neutral model We have described a neutral model for population allele frequencies at a SNP We can estimate the model parameters: p 0 : ancestral allele frequency, locus specific F: population kinship matrix, constant across loci. Note that other procedures could be used to estimate F. The hapflk software allows to use any kinship matrix. In our context: detecting selection is identifying loci for which the neutral model is not a good fit.

13 Outline 1 Theoretical Background Neutral model for SNP data in multiple populations Single SNP statistics for detecting selection: FLK Incorporating haplotype information: hapflk 2 Genome Scanning with hapflk

14 The FLK statistic: goodness-of-fit of the neutral model We can think of our Neutral model as a linear model: p = 1p 0 + r with r N (0, V), V = Fp 0 (1 p 0 ). A goodness-of-fit statistic for this model, estimated at a particular locus, is the deviance: (p 1 ˆp 0 ) T V 1 (p 1 ˆp 0 ) named the FLK Statistic (Bonhomme et al., 2010) Under H 0 (neutral model) for n populations, FLK follows a χ 2 (n 1).

15 Relationship with other statistics If we were to assume a star-like, equal branch length population tree: F = I n F ST where F ST is the mean F ST over loci (genomewide F ST ) ˆp 0 = p FLK = (n 1) F F ST ST

16 Relationship with other statistics If we were to assume a star-like, equal branch length population tree: F = I n F ST where F ST is the mean F ST over loci (genomewide F ST ) ˆp 0 = p FLK = (n 1) F F ST ST The Lewontin and Krakauer (1973) statistic (LK) LK = (n 1) F F ST ST The LK statistic gives the same ranking as F ST LK (or F ST ) scans for selection assume a very particular evolution model for populations Outliers of LK (or F ST ): bad fit of this model. Might not be due to selection (but wrong evolutionary model H 0 ).

17 Outline 1 Theoretical Background Neutral model for SNP data in multiple populations Single SNP statistics for detecting selection: FLK Incorporating haplotype information: hapflk 2 Genome Scanning with hapflk

18 Principle 1 Incorporate an haplotype diversity model within the FLK framework 2 Considering haplotypes as multi-allelic markers, use a multiallelic version of FLK (Bonhomme et al. 2010). However, haplotypes are not ancestral alleles (recombination happens). Modified multiallelic version: the hapflk statistic (Fariello et al. 2013). Unknown distibution.

19 The Scheet and Stephens (aka. fastphase) model Models the local similarity between haplotypes via a reduction of dimension: local clustering of haplotypes The underlying clusters can be considered as local haplotypes. Definition changes along the chromosome Model is a Hidden Markov Model, hidden states are clusters. As for all mixture models, need to specify number of components K.

20 Using LD models for FLK transform SNP genotypes into multiallelic genotypes Based on the posterior probability: P(Z il = k G i ) where Z il Underlying cluster for individual i at SNP l G i Observed SNP genotype (multilocus) Consider haplotype clusters as alleles The frequency of a cluster within a population is : p kl = 1 P(Z il = k G i ) N Advantages No need for sliding windows Model can be estimated on unphased genotype data Can incorporate missing data (e.g. mixture of dense and sparse data...) i

21 Outline 1 Theoretical Background Neutral model for SNP data in multiple populations Single SNP statistics for detecting selection: FLK Incorporating haplotype information: hapflk 2 Genome Scanning with hapflk

22 Example data: sheep from Northern Europe Kijas et al. (2012) PLoS Biology 6 Populations + Outgroup (Soay), 388 individuals, 49K SNPs Available at

23 Before diving in... Remember assumptions underlying the neutral model: Population tree Pure drift model (no mutations, no admixture) Small F i (say < 0.2) This means Discard strongly bottleneck-ed or admixed populations Consider that low frequency variants are more likely to have appeared after population spit. Perform a diversity analysis before: Population structure (STRUCTURE, PCA, treemix...) Within population kinship between individuals to identify a set of unrelated individuals

24 Get the software :) Available for Linux 64bits and MacOSX For estimation of the kinship matrix, needs R with ape and phangorn packages.

25 Run single SNP analysis hapflk reads PLINK files (ped/map or bed/bim/fam), first column (FID) must give the population name hapflk --bfile NorthernSheep --outgroup Soay 1. [ 00:00:00 ] Reading Input Files 2. [ 00:00:58 ] Computing Allele Frequencies NewZealandRomney 3. [ 00:02:21 ] Computing Reynolds distances 4. [ 00:02:21 ] Computing Kinship Matrix Loading required package: ape 5. [ 00:02:21 ] Computing FLK tests 6. [ 00:02:32 ] Writing down results 7. [ 00:02:36 ] The End NB: single SNP analysis is fast.

26 Output files hapflk_reynolds.txt : Reynolds Distance Matrix hapflk_poptree.pdf : Population tree figure hapflk_kinship.r : R code for estimating the kinship matrix hapflk_fij.txt : Kinship matrix hapflk.frq : Allele frequencies hapflk-snp-reynolds.txt: Reynolds Distances in the region (more later) hapflk.flk : FLK results

27 Population Tree IrishSuffolk NewZealandRomney Galway GermanTexel ScottishTexel NewZealandTexel

28 Fit of the χ 2 distribution (1) flk=read.table( hapflk.flk,head=t) mysnps=flk$pzero > 0.05 & flk$pzero < 0.95 hist(flk$flk[mysnps],n=50,freq=f,xlab= FLK,main= ) lines(xx,dchisq(xx,df=5),lwd=2) Density Good overall fit Slightly less high value than a χ 2 (5). Relatively high drift (F i 0.16) FLK

29 Fit of the χ 2 distribution (2) hist(flk$flk[!mysnps],n=50,freq=f,xlab= FLK,main= ) Density No fit For these SNPs, our neutral model is clearly wrong (no mutation in the tree). Proceed with caution for low/high ˆp 0 SNPs FLK

30 FLK Manhattan plot log10(p) Large drift affects power of single SNP tests

31 Let s go the haplotype way Wait..., what K? use fastphase cross validation routine on your favorite chromosome (the big one). hint : plink --chr 1 --recode-fastphase... hapflk2 --bfile NorthernSheep --outgroup Soay --kinship hapflk_fij.txt --chr 1 -K 40 --ncpu 7 -p OAR1 One run for each chromosome Note: if phased data, or inbred lines, possibility to specify it --phased or --inbred. Makes fitting LD model much faster.

32 hapflk output files hapflk.hapflk : hapflk results hapflk.kfrq.fit_{n}.bz2 : haplotype cluster freq. The fastphase model is estimated several (T) times (by default 20), and the hapflk statistic is averaged over this T fits. For each fit the haplotype cluster frequencies are given.

33 Distribution of the hapflk statistic In this particular case, the distribution is close to normal + outliers Robust estimation of the Normal distribution parameters: require(mass) mod=rlm(hapflk~1) mu=mod$coefficients[1] ss=mod$s pvalue=1-pnorm(hapflk,mean=mu,sd=ss)

34 Manhattan plot hapflk hapflk reveals clear outlying regions

35 Looking at a particular region Once outlying regions are found, we want to know which population(s) has experienced a selection event. Local allele frequencies Build local population trees to find which branch(es) have been affected. ( Eigen decomposition of hapflk )

36 Local SNP and Cluster frequencies chr 2 : Selection in Texel breeds : GDF8 (MSTN) mutation R script for haplotype cluter plots provided on hapflk webpage.

37 Local SNP and Cluster frequencies chr 14 : less obvious

38 Local population trees Script for making these trees to be released...

39 Example on the 1000 Bull genomes data Differentiation based tests can find causal mutations: example of coat color mutations (MC1R) in the 1000 Bull genomes dataset Position (Kbp)

40 References Nicholson et al J. Roy. Stat. Soc. B 64(4), Laval et al Genetics Selection Evolution, 34(4), Bonhomme et al Genetics, 186(1), Fariello et al Genetics, 193(3),

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES: .5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the

More information

Detecting Selection in Population Trees: The Lewontin and Krakauer Test Extended

Detecting Selection in Population Trees: The Lewontin and Krakauer Test Extended Copyright Ó 2010 by the Genetics Society of America DOI: 10.1534/genetics.110.117275 Detecting Selection in Population Trees: The Lewontin and Krakauer Test Extended Maxime Bonhomme,* Claude Chevalet,*

More information

Genetic Drift in Human Evolution

Genetic Drift in Human Evolution Genetic Drift in Human Evolution (Part 2 of 2) 1 Ecology and Evolutionary Biology Center for Computational Molecular Biology Brown University Outline Introduction to genetic drift Modeling genetic drift

More information

Populations in statistical genetics

Populations in statistical genetics Populations in statistical genetics What are they, and how can we infer them from whole genome data? Daniel Lawson Heilbronn Institute, University of Bristol www.paintmychromosomes.com Work with: January

More information

Population Genetics I. Bio

Population Genetics I. Bio Population Genetics I. Bio5488-2018 Don Conrad dconrad@genetics.wustl.edu Why study population genetics? Functional Inference Demographic inference: History of mankind is written in our DNA. We can learn

More information

Notes on Population Genetics

Notes on Population Genetics Notes on Population Genetics Graham Coop 1 1 Department of Evolution and Ecology & Center for Population Biology, University of California, Davis. To whom correspondence should be addressed: gmcoop@ucdavis.edu

More information

Learning ancestral genetic processes using nonparametric Bayesian models

Learning ancestral genetic processes using nonparametric Bayesian models Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew

More information

Mathematical models in population genetics II

Mathematical models in population genetics II Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population

More information

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) 12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²

More information

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda 1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;

More information

Microsatellite data analysis. Tomáš Fér & Filip Kolář

Microsatellite data analysis. Tomáš Fér & Filip Kolář Microsatellite data analysis Tomáš Fér & Filip Kolář Multilocus data dominant heterozygotes and homozygotes cannot be distinguished binary biallelic data (fragments) presence (dominant allele/heterozygote)

More information

Problems for 3505 (2011)

Problems for 3505 (2011) Problems for 505 (2011) 1. In the simplex of genotype distributions x + y + z = 1, for two alleles, the Hardy- Weinberg distributions x = p 2, y = 2pq, z = q 2 (p + q = 1) are characterized by y 2 = 4xz.

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012 Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012 Last Time Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based

More information

Multivariate analysis of genetic data an introduction

Multivariate analysis of genetic data an introduction Multivariate analysis of genetic data an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Population genomics in Lausanne 23 Aug 2016 1/25 Outline Multivariate

More information

Introduction to Advanced Population Genetics

Introduction to Advanced Population Genetics Introduction to Advanced Population Genetics Learning Objectives Describe the basic model of human evolutionary history Describe the key evolutionary forces How demography can influence the site frequency

More information

Methods for Cryptic Structure. Methods for Cryptic Structure

Methods for Cryptic Structure. Methods for Cryptic Structure Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases

More information

GBLUP and G matrices 1

GBLUP and G matrices 1 GBLUP and G matrices 1 GBLUP from SNP-BLUP We have defined breeding values as sum of SNP effects:! = #$ To refer breeding values to an average value of 0, we adopt the centered coding for genotypes described

More information

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency Bruce Walsh lecture notes Introduction to Quantitative Genetics SISG, Seattle 16 18 July 2018 1 Outline Genetics of complex

More information

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8 The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as

More information

Evolution of Populations. Chapter 17

Evolution of Populations. Chapter 17 Evolution of Populations Chapter 17 17.1 Genes and Variation i. Introduction: Remember from previous units. Genes- Units of Heredity Variation- Genetic differences among individuals in a population. New

More information

Introduction to Linkage Disequilibrium

Introduction to Linkage Disequilibrium Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have

More information

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam Banerjee November 26, 2007 Genetic Polymorphism Single nucleotide polymorphism (SNP) Genetic Polymorphism

More information

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion

More information

Processes of Evolution

Processes of Evolution 15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection

More information

Statistical issues in QTL mapping in mice

Statistical issues in QTL mapping in mice Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities

More information

Effect of Genetic Divergence in Identifying Ancestral Origin using HAPAA

Effect of Genetic Divergence in Identifying Ancestral Origin using HAPAA Effect of Genetic Divergence in Identifying Ancestral Origin using HAPAA Andreas Sundquist*, Eugene Fratkin*, Chuong B. Do, Serafim Batzoglou Department of Computer Science, Stanford University, Stanford,

More information

Haplotype-based variant detection from short-read sequencing

Haplotype-based variant detection from short-read sequencing Haplotype-based variant detection from short-read sequencing Erik Garrison and Gabor Marth July 16, 2012 1 Motivation While statistical phasing approaches are necessary for the determination of large-scale

More information

New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype)

New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype) New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype) Kelly Swarts PAG Allele Mining 1/11/2014 Imputation is the projection

More information

Lecture WS Evolutionary Genetics Part I 1

Lecture WS Evolutionary Genetics Part I 1 Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in

More information

p(d g A,g B )p(g B ), g B

p(d g A,g B )p(g B ), g B Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)

More information

1. Understand the methods for analyzing population structure in genomes

1. Understand the methods for analyzing population structure in genomes MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population

More information

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall

More information

Genetic Association Studies in the Presence of Population Structure and Admixture

Genetic Association Studies in the Presence of Population Structure and Admixture Genetic Association Studies in the Presence of Population Structure and Admixture Purushottam W. Laud and Nicholas M. Pajewski Division of Biostatistics Department of Population Health Medical College

More information

The problem Lineage model Examples. The lineage model

The problem Lineage model Examples. The lineage model The lineage model A Bayesian approach to inferring community structure and evolutionary history from whole-genome metagenomic data Jack O Brien Bowdoin College with Daniel Falush and Xavier Didelot Cambridge,

More information

Lecture 13: Population Structure. October 8, 2012

Lecture 13: Population Structure. October 8, 2012 Lecture 13: Population Structure October 8, 2012 Last Time Effective population size calculations Historical importance of drift: shifting balance or noise? Population structure Today Course feedback The

More information

Gene mapping in model organisms

Gene mapping in model organisms Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2

More information

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm

More information

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics: Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships

More information

122 9 NEUTRALITY TESTS

122 9 NEUTRALITY TESTS 122 9 NEUTRALITY TESTS 9 Neutrality Tests Up to now, we calculated different things from various models and compared our findings with data. But to be able to state, with some quantifiable certainty, that

More information

Genetic diversity and population structure in rice. S. Kresovich 1,2 and T. Tai 3,5. Plant Breeding Dept, Cornell University, Ithaca, NY

Genetic diversity and population structure in rice. S. Kresovich 1,2 and T. Tai 3,5. Plant Breeding Dept, Cornell University, Ithaca, NY Genetic diversity and population structure in rice S. McCouch 1, A. Garris 1,2, J. Edwards 1, H. Lu 1,3 M Redus 4, J. Coburn 1, N. Rutger 4, S. Kresovich 1,2 and T. Tai 3,5 1 Plant Breeding Dept, Cornell

More information

Hidden Markov models in population genetics and evolutionary biology

Hidden Markov models in population genetics and evolutionary biology Hidden Markov models in population genetics and evolutionary biology Gerton Lunter Wellcome Trust Centre for Human Genetics Oxford, UK April 29, 2013 Topics for today Markov chains Hidden Markov models

More information

Space Time Population Genetics

Space Time Population Genetics CHAPTER 1 Space Time Population Genetics I invoke the first law of geography: everything is related to everything else, but near things are more related than distant things. Waldo Tobler (1970) Spatial

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:

More information

Population Structure

Population Structure Ch 4: Population Subdivision Population Structure v most natural populations exist across a landscape (or seascape) that is more or less divided into areas of suitable habitat v to the extent that populations

More information

Classical Selection, Balancing Selection, and Neutral Mutations

Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection Perspective of the Fate of Mutations All mutations are EITHER beneficial or deleterious o Beneficial mutations are selected

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility SWEEPFINDER2: Increased sensitivity, robustness, and flexibility Michael DeGiorgio 1,*, Christian D. Huber 2, Melissa J. Hubisz 3, Ines Hellmann 4, and Rasmus Nielsen 5 1 Department of Biology, Pennsylvania

More information

Introduction to population genetics & evolution

Introduction to population genetics & evolution Introduction to population genetics & evolution Course Organization Exam dates: Feb 19 March 1st Has everybody registered? Did you get the email with the exam schedule Summer seminar: Hot topics in Bioinformatics

More information

The neutral theory of molecular evolution

The neutral theory of molecular evolution The neutral theory of molecular evolution Introduction I didn t make a big deal of it in what we just went over, but in deriving the Jukes-Cantor equation I used the phrase substitution rate instead of

More information

Phylogenetic Networks with Recombination

Phylogenetic Networks with Recombination Phylogenetic Networks with Recombination October 17 2012 Recombination All DNA is recombinant DNA... [The] natural process of recombination and mutation have acted throughout evolution... Genetic exchange

More information

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics. Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

Supporting Information Text S1

Supporting Information Text S1 Supporting Information Text S1 List of Supplementary Figures S1 The fraction of SNPs s where there is an excess of Neandertal derived alleles n over Denisova derived alleles d as a function of the derived

More information

Linkage and Linkage Disequilibrium

Linkage and Linkage Disequilibrium Linkage and Linkage Disequilibrium Summer Institute in Statistical Genetics 2014 Module 10 Topic 3 Linkage in a simple genetic cross Linkage In the early 1900 s Bateson and Punnet conducted genetic studies

More information

Overview. Background

Overview. Background Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems

More information

Haploid & diploid recombination and their evolutionary impact

Haploid & diploid recombination and their evolutionary impact Haploid & diploid recombination and their evolutionary impact W. Garrett Mitchener College of Charleston Mathematics Department MitchenerG@cofc.edu http://mitchenerg.people.cofc.edu Introduction The basis

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele

More information

The genomes of recombinant inbred lines

The genomes of recombinant inbred lines The genomes of recombinant inbred lines Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman C57BL/6 2 1 Recombinant inbred lines (by sibling mating)

More information

Supplementary Materials: Efficient moment-based inference of admixture parameters and sources of gene flow

Supplementary Materials: Efficient moment-based inference of admixture parameters and sources of gene flow Supplementary Materials: Efficient moment-based inference of admixture parameters and sources of gene flow Mark Lipson, Po-Ru Loh, Alex Levin, David Reich, Nick Patterson, and Bonnie Berger 41 Surui Karitiana

More information

Testing for spatially-divergent selection: Comparing Q ST to F ST

Testing for spatially-divergent selection: Comparing Q ST to F ST Genetics: Published Articles Ahead of Print, published on August 17, 2009 as 10.1534/genetics.108.099812 Testing for spatially-divergent selection: Comparing Q to F MICHAEL C. WHITLOCK and FREDERIC GUILLAUME

More information

Expected complete data log-likelihood and EM

Expected complete data log-likelihood and EM Expected complete data log-likelihood and EM In our EM algorithm, the expected complete data log-likelihood Q is a function of a set of model parameters τ, ie M Qτ = log fb m, r m, g m z m, l m, τ p mz

More information

How robust are the predictions of the W-F Model?

How robust are the predictions of the W-F Model? How robust are the predictions of the W-F Model? As simplistic as the Wright-Fisher model may be, it accurately describes the behavior of many other models incorporating additional complexity. Many population

More information

CONSERVATION AND THE GENETICS OF POPULATIONS

CONSERVATION AND THE GENETICS OF POPULATIONS CONSERVATION AND THE GENETICS OF POPULATIONS FredW.Allendorf University of Montana and Victoria University of Wellington and Gordon Luikart Universite Joseph Fourier, CNRS and University of Montana With

More information

USING LINEAR PREDICTORS TO IMPUTE ALLELE FREQUENCIES FROM SUMMARY OR POOLED GENOTYPE DATA. By Xiaoquan Wen and Matthew Stephens University of Chicago

USING LINEAR PREDICTORS TO IMPUTE ALLELE FREQUENCIES FROM SUMMARY OR POOLED GENOTYPE DATA. By Xiaoquan Wen and Matthew Stephens University of Chicago Submitted to the Annals of Applied Statistics USING LINEAR PREDICTORS TO IMPUTE ALLELE FREQUENCIES FROM SUMMARY OR POOLED GENOTYPE DATA By Xiaoquan Wen and Matthew Stephens University of Chicago Recently-developed

More information

Big Idea #1: The process of evolution drives the diversity and unity of life

Big Idea #1: The process of evolution drives the diversity and unity of life BIG IDEA! Big Idea #1: The process of evolution drives the diversity and unity of life Key Terms for this section: emigration phenotype adaptation evolution phylogenetic tree adaptive radiation fertility

More information

Adaptation and genetics. Block course Zoology & Evolution 2013, Daniel Berner

Adaptation and genetics. Block course Zoology & Evolution 2013, Daniel Berner Adaptation and genetics Block course Zoology & Evolution 2013, Daniel Berner 2 Conceptual framework Evolutionary biology tries to understand the mechanisms that lead from environmental variation to biological

More information

Introduction to Natural Selection. Ryan Hernandez Tim O Connor

Introduction to Natural Selection. Ryan Hernandez Tim O Connor Introduction to Natural Selection Ryan Hernandez Tim O Connor 1 Goals Learn about the population genetics of natural selection How to write a simple simulation with natural selection 2 Basic Biology genome

More information

Neutral Theory of Molecular Evolution

Neutral Theory of Molecular Evolution Neutral Theory of Molecular Evolution Kimura Nature (968) 7:64-66 King and Jukes Science (969) 64:788-798 (Non-Darwinian Evolution) Neutral Theory of Molecular Evolution Describes the source of variation

More information

Population Genetics & Evolution

Population Genetics & Evolution The Theory of Evolution Mechanisms of Evolution Notes Pt. 4 Population Genetics & Evolution IMPORTANT TO REMEMBER: Populations, not individuals, evolve. Population = a group of individuals of the same

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

2. Map genetic distance between markers

2. Map genetic distance between markers Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,

More information

Lecture 13: Variation Among Populations and Gene Flow. Oct 2, 2006

Lecture 13: Variation Among Populations and Gene Flow. Oct 2, 2006 Lecture 13: Variation Among Populations and Gene Flow Oct 2, 2006 Questions about exam? Last Time Variation within populations: genetic identity and spatial autocorrelation Today Variation among populations:

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

Production type of Slovak Pinzgau cattle in respect of related breeds

Production type of Slovak Pinzgau cattle in respect of related breeds Original Paper Production type of Slovak Pinzgau cattle in respect of related breeds Veronika Šidlová* 1, Nina Moravčíková 1, Anna Trakovická 1, Maja Ferenčaković 2, Ino Curik 2, Radovan Kasarda 1 1 Slovak

More information

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies. Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus

More information

Breeding Values and Inbreeding. Breeding Values and Inbreeding

Breeding Values and Inbreeding. Breeding Values and Inbreeding Breeding Values and Inbreeding Genotypic Values For the bi-allelic single locus case, we previously defined the mean genotypic (or equivalently the mean phenotypic values) to be a if genotype is A 2 A

More information

Genetics: Early Online, published on February 26, 2016 as /genetics Admixture, Population Structure and F-statistics

Genetics: Early Online, published on February 26, 2016 as /genetics Admixture, Population Structure and F-statistics Genetics: Early Online, published on February 26, 2016 as 10.1534/genetics.115.183913 GENETICS INVESTIGATION Admixture, Population Structure and F-statistics Benjamin M Peter 1 1 Department of Human Genetics,

More information

Linkage disequilibrium and the genetic distance in livestock populations: the impact of inbreeding

Linkage disequilibrium and the genetic distance in livestock populations: the impact of inbreeding Genet. Sel. Evol. 36 (2004) 281 296 281 c INRA, EDP Sciences, 2004 DOI: 10.1051/gse:2004002 Original article Linkage disequilibrium and the genetic distance in livestock populations: the impact of inbreeding

More information

Use of hidden Markov models for QTL mapping

Use of hidden Markov models for QTL mapping Use of hidden Markov models for QTL mapping Karl W Broman Department of Biostatistics, Johns Hopkins University December 5, 2006 An important aspect of the QTL mapping problem is the treatment of missing

More information

Y-STR: Haplotype Frequency Estimation and Evidence Calculation

Y-STR: Haplotype Frequency Estimation and Evidence Calculation Calculating evidence Further work Questions and Evidence Mikkel, MSc Student Supervised by associate professor Poul Svante Eriksen Department of Mathematical Sciences Aalborg University, Denmark June 16

More information

Modelling Genetic Variations with Fragmentation-Coagulation Processes

Modelling Genetic Variations with Fragmentation-Coagulation Processes Modelling Genetic Variations with Fragmentation-Coagulation Processes Yee Whye Teh, Charles Blundell, Lloyd Elliott Gatsby Computational Neuroscience Unit, UCL Genetic Variations in Populations Inferring

More information

, Helen K. Pigage 1, Peter J. Wettstein 2, Stephanie A. Prosser 1 and Jon C. Pigage 1ˆ. Jeremy M. Bono 1*

, Helen K. Pigage 1, Peter J. Wettstein 2, Stephanie A. Prosser 1 and Jon C. Pigage 1ˆ. Jeremy M. Bono 1* Bono et al. BMC Evolutionary Biology (2018) 18:139 https://doi.org/10.1186/s12862-018-1248-4 RESEARCH ARTICLE Genome-wide markers reveal a complex evolutionary history involving divergence and introgression

More information

Using haplotypes for the prediction of allelic identity to fine-map QTL: characterization and properties

Using haplotypes for the prediction of allelic identity to fine-map QTL: characterization and properties Using haplotypes for the prediction of allelic identity to fine-map QTL: characterization and properties Laval Jacquin 1,2,3 Corresponding author Email: Julien.Jacquin@toulouse.inra.fr Jean-Michel Elsen

More information

I N N O V A T I O N L E C T U R E S (I N N O l E C) Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN, MASSACHUSETTS, U.S.A.

I N N O V A T I O N L E C T U R E S (I N N O l E C) Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN, MASSACHUSETTS, U.S.A. I N N O V A T I O N L E C T U R E S (I N N O l E C) Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing: Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

More information

The Quantitative TDT

The Quantitative TDT The Quantitative TDT (Quantitative Transmission Disequilibrium Test) Warren J. Ewens NUS, Singapore 10 June, 2009 The initial aim of the (QUALITATIVE) TDT was to test for linkage between a marker locus

More information

Demographic Inference with Coalescent Hidden Markov Model

Demographic Inference with Coalescent Hidden Markov Model Demographic Inference with Coalescent Hidden Markov Model Jade Y. Cheng Thomas Mailund Bioinformatics Research Centre Aarhus University Denmark The Thirteenth Asia Pacific Bioinformatics Conference HsinChu,

More information

opulation genetics undamentals for SNP datasets

opulation genetics undamentals for SNP datasets opulation genetics undamentals for SNP datasets with crocodiles) Sam Banks Charles Darwin University sam.banks@cdu.edu.au I ve got a SNP genotype dataset, now what? Do my data meet the requirements of

More information

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees: MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation

More information

Spatial localization of recent ancestors for admixed individuals

Spatial localization of recent ancestors for admixed individuals G3: Genes Genomes Genetics Early Online, published on November 3, 2014 as doi:10.1534/g3.114.014274 Spatial localization of recent ancestors for admixed individuals Wen-Yun Yang 1, Alexander Platt 2, Charleston

More information

Mapping QTL to a phylogenetic tree

Mapping QTL to a phylogenetic tree Mapping QTL to a phylogenetic tree Karl W Broman Department of Biostatistics & Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman Human vs mouse www.daviddeen.com 3 Intercross

More information

Population Genetics: a tutorial

Population Genetics: a tutorial : a tutorial Institute for Science and Technology Austria ThRaSh 2014 provides the basic mathematical foundation of evolutionary theory allows a better understanding of experiments allows the development

More information

Phasing via the Expectation Maximization (EM) Algorithm

Phasing via the Expectation Maximization (EM) Algorithm Computing Haplotype Frequencies and Haplotype Phasing via the Expectation Maximization (EM) Algorithm Department of Computer Science Brown University, Providence sorin@cs.brown.edu September 14, 2010 Outline

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Lies, damn lies, and. genomics

Lies, damn lies, and. genomics Lies, damn lies, and. genomics you, your data, your perceptions and reality Christopher West Wheat Goal of this lecture Present a critical view of ecological genomics Make you uncomfortable by sharing

More information