Populations in statistical genetics

Similar documents
Similarity Measures and Clustering In Genetics

Supplementary Materials for

Supplementary Materials: Efficient moment-based inference of admixture parameters and sources of gene flow

Mathematical models in population genetics II

Notes on Population Genetics

Learning ancestral genetic processes using nonparametric Bayesian models

Genetic Drift in Human Evolution

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Processes of Evolution

Detecting selection from differentiation between populations: the FLK and hapflk approach.

The Combinatorial Interpretation of Formulas in Coalescent Theory

6 Introduction to Population Genetics

6 Introduction to Population Genetics

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Introduction to Digital Evolution Handout Answers

Multivariate analysis of genetic data: an introduction

Frequency Spectra and Inference in Population Genetics

Evolution. Species Changing over time

1. Understand the methods for analyzing population structure in genomes

Theory a well supported testable explanation of phenomenon occurring in the natural world.

How robust are the predictions of the W-F Model?

The problem Lineage model Examples. The lineage model

Methods for Cryptic Structure. Methods for Cryptic Structure

Genetic Association Studies in the Presence of Population Structure and Admixture

Evolution & Natural Selection. Part 2

Perplexing Observations. Today: Thinking About Darwinian Evolution. We owe much of our understanding of EVOLUTION to CHARLES DARWIN.

The implications of neutral evolution for neutral ecology. Daniel Lawson Bioinformatics and Statistics Scotland Macaulay Institute, Aberdeen

Calculation of IBD probabilities

OVERVIEW. L5. Quantitative population genetics

Calculation of IBD probabilities

Phylogenetic Networks with Recombination

REVIEW 6: EVOLUTION. 1. Define evolution: Was not the first to think of evolution, but he did figure out how it works (mostly).

Sympatric Speciation

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Chapter 2 Section 1 discussed the effect of the environment on the phenotype of individuals light, population ratio, type of soil, temperature )

Evolutionary change. Evolution and Diversity. Two British naturalists, one revolutionary idea. Darwin observed organisms in many environments

Topic 7: Evolution. 1. The graph below represents the populations of two different species in an ecosystem over a period of several years.

Evolution. Species Changing over time

NCEA Level 2 Biology (91157) 2017 page 1 of 5 Assessment Schedule 2017 Biology: Demonstrate understanding of genetic variation and change (91157)

Chapter 16: Evolutionary Theory

p(d g A,g B )p(g B ), g B

Chapter 16. Table of Contents. Section 1 Genetic Equilibrium. Section 2 Disruption of Genetic Equilibrium. Section 3 Formation of Species

Big Idea #1: The process of evolution drives the diversity and unity of life

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

Mechanisms of Evolution Microevolution. Key Concepts. Population Genetics

Linkage and Linkage Disequilibrium

Robust demographic inference from genomic and SNP data

Effective population size and patterns of molecular evolution and variation

EVOLUTION. Evolution - changes in allele frequency in populations over generations.

Evolution of Populations. Chapter 17

Week 7.2 Ch 4 Microevolutionary Proceses

MS-LS3-1 Heredity: Inheritance and Variation of Traits

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

Genetic diversity and population structure in rice. S. Kresovich 1,2 and T. Tai 3,5. Plant Breeding Dept, Cornell University, Ithaca, NY

Population Genetics I. Bio

Challenges when applying stochastic models to reconstruct the demographic history of populations.

Reproduction- passing genetic information to the next generation

Introduction to Advanced Population Genetics

Population Genetics & Evolution

Evolution & Natural Selection

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Supporting Information

Evolution Problem Drill 10: Human Evolution

Mathematical Population Genetics II

Mutation, Selection, Gene Flow, Genetic Drift, and Nonrandom Mating Results in Evolution

A Phylogenetic Network Construction due to Constrained Recombination

Evolution and Natural Selection (16-18)

Biology 110 Survey of Biology. Quizzam

Mathematical Population Genetics II

Introduction to population genetics & evolution

Speciation. Today s OUTLINE: Mechanisms of Speciation. Mechanisms of Speciation. Geographic Models of speciation. (1) Mechanisms of Speciation

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

Chapter 5 Evolution of Biodiversity. Sunday, October 1, 17

Enduring Understanding: Change in the genetic makeup of a population over time is evolution Pearson Education, Inc.

Effect of Genetic Divergence in Identifying Ancestral Origin using HAPAA

Unifying theories of molecular, community and network evolution 1

Gene Pool Genetic Drift Geographic Isolation Fitness Hardy-Weinberg Equilibrium Natural Selection

Evolutionary Genetics: Part 0.2 Introduction to Population genetics

Learning Ancestral Genetic Processes using Nonparametric Bayesian Models

Chapters AP Biology Objectives. Objectives: You should know...

PCA, admixture proportions and SFS for low depth NGS data. Anders Albrechtsen

Formalizing the gene centered view of evolution

Asymptotic distribution of the largest eigenvalue with application to genetic data

EVOLUTION. HISTORY: Ideas that shaped the current evolutionary theory. Evolution change in populations over time.

Evolution Test Review

Lecture 13: Population Structure. October 8, 2012

The Wright-Fisher Model and Genetic Drift

(Genome-wide) association analysis

First go to

Big Idea 1: The process of evolution drives the diversity and unity of life.

Stochastic Demography, Coalescents, and Effective Population Size

chatper 17 Multiple Choice Identify the choice that best completes the statement or answers the question.

AP Curriculum Framework with Learning Objectives

Enduring understanding 1.A: Change in the genetic makeup of a population over time is evolution.

Computational Systems Biology: Biology X

Statistical Models in Evolutionary Biology An Introductory Discussion

Chapter 26 Phylogeny and the Tree of Life

Topic outline: Review: evolution and natural selection. Evolution 1. Geologic processes 2. Climate change 3. Catastrophes. Niche.

Ohio Tutorials are designed specifically for the Ohio Learning Standards to prepare students for the Ohio State Tests and end-ofcourse

Linear Regression (1/1/17)

Transcription:

Populations in statistical genetics What are they, and how can we infer them from whole genome data? Daniel Lawson Heilbronn Institute, University of Bristol www.paintmychromosomes.com Work with: January 2014, Cambridge 1 / 35

Overview Some views of population A statistical definition Generative approaches Inference from sparse weakly linked data (STRUCTURE) Inference from dense linked data (finestructure) Selected results 2 / 35

What is a population in general? Like species, populations are elusive objects. Definitions are a function of knowledge and application Some definitions - Need not be biologically meaningful Example: Individuals found on same island Example: Sample location due to clustered sampling procedure Example: Disease status for case/control studies In statistics: we generalize from samples to a population Share: individuals are equivalent under some measure But do populations really exist? Does it matter if they are useful? 3 / 35

Diversion: some fuzzy definitions of species Are definitions of species analogous to definitions of populations? Typological species: Organisms that share the same set of phenotypes Ecological species: Organisms that compete for the same environment Phylogenetic species: Organisms with a single common ancestor Biological species: Organisms that can share DNA via sexual reproduction (Western/Eastern) Meadow Lark: http://evolution.berkeley.edu 4 / 35

What is a population? Motivation from genetics We are interested in non-subjective definitions of populations. Individuals have equivalent genetic ancestry This depends on the data available...... and the model used Knowledge driven definition Requires defining equivalent! 5 / 35

The birds and the bees (vs the plants and bacteria) Sex dramatically affects genetic transmission Different population concept required Sexual species Generalise the Biological Species concept Random mating within a population Try to detect deviations from random mating to identify populations Asexual species Generalise the Ecological Species concept Neutral competition within a population Try to detect deviations from neutrality to identify populations Not examined further in this talk 6 / 35

Top down approaches Start from a theoretical generative model of reproduction. For example: Individuals move between populations via migration Discrete generations (for simplicity) Each generation randomly chooses parent(s) within a population Population structure: individuals migrate between populations but mate randomly within each 7 / 35

Ancestry Process Time Present 8 / 35

Populations as Demes Demes Migration Ammerman and Cavalli-Sforza, 1984 The Neolithic Transition and the Genetics of Populations in Europe. 9 / 35

Diversion: Principal Components Analysis PCA is widely used in genetics, and helpful for visualisation It does not make any explicit modelling assumptions... BUT to interpret the output, you do! Just rotating the data Similar individuals tend to be close Differences shared by many individuals tend to appear first Each component describes a different direction of variation If we are lucky, these correspond to real shared drift events There is no unique interpretation of PCA (see McVean 2009) See Lawson & Falush 2012 for details. 10 / 35

Example Time in the past A) Historical Scenario d1 d2 Present Pop A Pop B1 Pop B2 C) PCA of A B) Spatial Scenario D) PCA of B Populations Migration Space PC2 0.2 PC2 0.2 0.1 f(d2) B2 0.2 PC1 0.2 0.1 0 0.1 0.2 f(d1) A 0.1 PC1 0.2 0 0.2 B1 0.2 11 / 35

Bottom up approaches Start from a definition of population as equivalent individuals Within a population, individuals are randomly mating Small samples of large populations: individuals are approximately independent Smaller populations: relationships must be accounted for What does random mating mean for the data? How are populations related? 12 / 35

Top down vs Bottom up A) Generative Model B) Statistical Model C) Full Statistical Model Time 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.6 0.9 1.0 Density 0.0 0.6 0.9 1.0 Time 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.6 0.9 1.0 Genetic Postition Genetic Postition Genetic Postition A) Generative approaches are best in theory, if we can make the model match reality But, hard to use in practice - how to do inference? B) Bottom up approaches are approximate - might lose power C) But can be refined until they are close to the generating process 13 / 35

Ancestry Process - Ancestral Recombination Graph Time Take limit N with N/T constant Recombination rate ρ (for tract length) Ignore unbranching ancestors Present 14 / 35

Structure model Pritchard, Stephens & Donnelly 2000. Populations are large and well mixed SNPs D il are unlinked* loci have some ancestral frequency p 0l P( ) Population k has frequency p kl drifted from ancestral p 0l ** p kl Dirichlet(p 0l ) Individual i is in population k if Q ik = 1, assigned by P(D il p Qik,l) l Individuals in the same population are exchangeable with respect to the SNP frequencies * Solutions to this have been explored (computationally inconvenient) ** Valid approximation, originally derived by Wright 15 / 35

Single SNP with populations Ancestral Population Mutation Some SNP distribution Present populations 'Independent drift' of SNP distribution 16 / 35

Scaling STRUCTURE STRUCTURE approach has a parameter for every SNP. But: Assuming that drift is weak, p 0l = E(D l ) and: p(p kl ) N (p 0l, p 0l (1 p 0l )) Probability SNP l is shared not by chance: X ijl = D il D jl /p 0l + (1 D il )(1 D jl )/(1 p 0l ) Invoke the Central limit theorem: X ij = L X ijl N(µ ij, σij) 2 l=1 This is the Coancestry Matrix It is a sufficient statistic for p(d Q) µ and σ known and same for all individuals in a population Exchangability again! Now lets think about linkage... 17 / 35

Ancestral Recombination Graph Time towards present Hein, Schierup and Wiuf 'Gene Genealogies, Variation and Evolution', OUP 2005 18 / 35

ChromoPainter 19 / 35

ChromoPainter Hidden Markov Model Reference Panel x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Reconstruction x x x x x Data x x x x x x 20 / 35

FineSTRUCTURE model Each population has a characteristic rate P ab of sharing chunks with each other population Individuals are again exchangeable within populations Each recombination event has probability ˆP ab = P ab /ˆn b when coming from an individual in population b into population a Dirichlet Process prior on the parameters P a We integrate out P, leaving no population level parameters We can put a meaningful prior on the variation of P between populations, and the number of populations In practice these details don t matter much 21 / 35

Example: Africa HGDP 0.86 0.99 0.86 0.86 0.99 0.90 0.93 0.84 0.84 0.86 1000 BiakaPygmy 900 MbutiPygmy San BantuSouthAfrica BantuSouthAfrica BantuKenya 800 700 600 500 Yoruba 400 300 Mandenka 200 100 Mandenka Yoruba BantuKenya BantuSouthAfrica BantuSouthAfrica San MbutiPygmy BiakaPygmy 22 / 35

Ancient Genomes: fennoscandia.blogspot.co.uk Ajv70 Gotland hunter gatherer fennoscandia.blogspot.co.uk/2013/11/ajv70-and-modern-european-variation-ii.html 444k SNPs 23 / 35

History of populations We have assumed that we observe individuals from real populations Populations differ by genetic drift This works, even if there is historical migration, provided that the mixture fractions are equal (exchangeability) Real individuals are related by a combination of drift and admixture Ancestral population graph Time Split Split Migration Bottleneck Migration Populations 24 / 35

Admixture A) Discrete Admixture B) Continuous Admixture Time in the past Present Pop AB1 Pop A Pop B1 Pop B2 C) PCA of A Time in the past Present Pop AB1 Pop A Pop B1 Pop B2 C) PCA of B A 0.2 0.1 0.2 0.1 0.1 0.2 AB1 x 0.1 0.2 PC2 B2 PC1 B1 A 0.1 0.2 0.1 0.1 0.2 AB1 0.2 0.1 0.2 PC2 B2 PC1 B1 Admixture describes mixture without drift. 25 / 35

Admixture A) Non admixed populations B) Mixing parameter 0.1 Density 0.00 0.10 0.20 10 5 0 5 10 Genetic Location Density 0.00 0.05 0.10 0.15 10 5 0 5 10 Genetic Location Density 0.00 0.02 0.04 0.06 0.08 C) Mixing parameter 1 10 5 0 5 10 Genetic Location Density 0.00 0.04 0.08 0.12 Continuous admixture is a problem. D) Mixing parameter 5 10 5 0 5 10 Genetic Location 26 / 35

Admixture models STRUCTURE can infer pure populations from admixed populations Population assignment Q ik sums to 1 without requiring a single element Interpretation: Observed individuals are mixtures of pure populations, without drift How can we tell apart: Large drift, mixed by admixture? small drift without admixture? Solution: SNPs have fixed in some populations p ik = 0 (or 1) FineSTRUCTURE cannot use this description, as we ve integrated out these details 27 / 35

Drift model See each drift event as independent Population assignment Q ik now takes any value Amount of each drift event retained by individual i Reconstructs the coancestry matrix Requires a strong prior to obtain a unique solution Pop AB1 Drift Components 28 / 35

29 / 35

Projects Siberian cold adaptation (Alexia & Toomas, in review) GlobeTrotter (Hellenthal et al: very accurate admixture dating using ChromoPainter, to appear in Science) Peopling of the British Isles (in review) UK10K, ALSPAC (4K whole genomes, use in genome-wide association studies) Highly recombining asexuals (fungus, bacteria) Model improvements: Relatedness Admixture/history of populations Complete recoding for usability Efficient computation (fastfinestructure) 30 / 35

See www.paintmychromosomes.com Lawson, Hellenthal, Myers & Falush, Inference of population structure using dense haplotype data, 2012. PLoS Genetics. Lawson & Falush Similarity matrices and clustering algorithms for population identifcation using genetic data, 2012. ARHG. Lawson 2014 Populations in statistical genetics modelling and inference, in Populations in the Human Sciences, Eds. Kreager, Capelli, Ulijaszek & Winney. 31 / 35

Relatedness What if individuals within a population are related? 32 / 35

Relatedness Grandparents Parents Cousins Relatedness IBD Samples are cousins This is very easy to tell from their tract length distribution Excluding these tracts, we sample from the population distribution of chunks Multiple ordering model in progress 33 / 35

Choice of measure IBS Measure range = ( 0.832, 0.843 ) 6.52 COV Measure range = ( 5246, 3766 ) 4.04 ESU Measure range = ( 0.0339, 0.0201 ) 4.04 C2 5.54 C2 3.31 C2 3.33 4.57 2.57 2.61 C1 3.6 C1 1.83 C1 1.89 2.62 1.1 1.18 B2 1.65 B2 0.36 B2 0.46 0.68 0.37 0.26 B1 0.29 B1 1.11 B1 0.97 1.27 1.84 1.69 A 2.24 A 2.58 A 2.4 A B1 B2 C1 C2 3.21 A B1 B2 C1 C2 3.32 A B1 B2 C1 C2 3.12 FHS Measure range = ( 366346, 434550 ) 4.39 IBD Measure range = ( 26290, 164376 ) 4.52 CPL Measure range = ( 1774, 2131 ) 4.52 C2 3.6 C2 3.84 C2 3.78 2.81 3.17 3.04 C1 2.01 C1 2.49 C1 2.3 1.22 1.81 1.56 B2 0.43 B2 1.14 B2 0.82 0.37 0.46 0.08 B1 1.16 B1 0.22 B1 0.65 1.95 0.9 1.39 A 2.75 A 1.57 A 2.13 A B1 B2 C1 C2 3.54 A B1 B2 C1 C2 2.25 A B1 B2 C1 C2 2.87 34 / 35

Choice of measure Between / Within Population Distance 3.5 3.0 2.5 2.0 1.5 1.0 0.5 Number of regions 0 20 40 60 80 100 CPL IBD ESU FHS COV IBS 35 / 35