Lecture 13: Population Structure. October 8, 2012

Similar documents
Lecture 13: Variation Among Populations and Gene Flow. Oct 2, 2006

Population Structure

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

19. Genetic Drift. The biological context. There are four basic consequences of genetic drift:

Drift Inflates Variance among Populations. Geographic Population Structure. Variance among groups increases across generations (Buri 1956)

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity,

Question: If mating occurs at random in the population, what will the frequencies of A 1 and A 2 be in the next generation?

GBLUP and G matrices 1

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Genotype Imputation. Biostatistics 666

Shane s Simple Guide to F-statistics

F SR = (H R H S)/H R. Frequency of A Frequency of a Population Population

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

Population genetics snippets for genepop

Microsatellite data analysis. Tomáš Fér & Filip Kolář

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?

Linear Regression (1/1/17)

On inferring and interpreting genetic population structure - applications to conservation, and the estimation of pairwise genetic relatedness

Calculation of IBD probabilities

8. Genetic Diversity

Notes on Population Genetics

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Genetic Drift in Human Evolution

Inbreeding depression due to stabilizing selection on a quantitative character. Emmanuelle Porcher & Russell Lande

Learning ancestral genetic processes using nonparametric Bayesian models

Mathematical models in population genetics II

List the five conditions that can disturb genetic equilibrium in a population.(10)

Estimating Evolutionary Trees. Phylogenetic Methods

Quantitative Trait Variation

Calculation of IBD probabilities

Mechanisms of Evolution

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Introduction to Linkage Disequilibrium

Population Genetics & Evolution

(Genome-wide) association analysis

Microevolution Changing Allele Frequencies

Breeding Values and Inbreeding. Breeding Values and Inbreeding

Genetics and Natural Selection

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

1. Understand the methods for analyzing population structure in genomes

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

HEREDITY: Objective: I can describe what heredity is because I can identify traits and characteristics

Linkage and Linkage Disequilibrium

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

It all depends on barriers that prevent members of two species from producing viable, fertile hybrids.

Genetic Association Studies in the Presence of Population Structure and Admixture

Workshop: Kinship analysis First lecture: Basics: Genetics, weight of evidence. I.1

Evaluating the performance of a multilocus Bayesian method for the estimation of migration rates

Gene Pool Genetic Drift Geographic Isolation Fitness Hardy-Weinberg Equilibrium Natural Selection

Chapter Eleven: Heredity

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Answers and expectations

Chapter 02 Population Genetics

MICROSATELLITE DNA POLYMORPHISM IN SELECTIVELY CONTROLLED APIS MELLIFERA CARNICA AND APIS MELLIFERA CAUCASICA POPULATIONS FROM POLAND

Classical Selection, Balancing Selection, and Neutral Mutations

Package FinePop. October 26, 2018

The Wright-Fisher Model and Genetic Drift

The theory of evolution continues to be refined as scientists learn new information.

A spatial statistical model for landscape genetics

Enduring Understanding: Change in the genetic makeup of a population over time is evolution Pearson Education, Inc.

Multiple QTL mapping

Lab 12. Linkage Disequilibrium. November 28, 2012

Neutral Theory of Molecular Evolution

BIOINFORMATICS. StructHDP: Automatic inference of number of clusters and population structure from admixed genotype data

Space Time Population Genetics

Chapter 17: Population Genetics and Speciation

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Case Studies in Ecology and Evolution

Genetics: Early Online, published on February 26, 2016 as /genetics

Advance Organizer. Topic: Mendelian Genetics and Meiosis

Unit 2 Lesson 4 - Heredity. 7 th Grade Cells and Heredity (Mod A) Unit 2 Lesson 4 - Heredity

Evolutionary Genetics Midterm 2008

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

Processes of Evolution

Lecture WS Evolutionary Genetics Part I 1

Detecting selection from differentiation between populations: the FLK and hapflk approach.

STAT 536: Genetic Statistics

Match probabilities in a finite, subdivided population

Directed Reading B. Section: Traits and Inheritance A GREAT IDEA

2. Map genetic distance between markers

The statistical evaluation of DNA crime stains in R

Genetic Differentiation between Marine Iguanas from Different Breeding Sites on the Island of Santa Fé (Galápagos Archipelago)

STAT 536: Migration. Karin S. Dorman. October 3, Department of Statistics Iowa State University

Haplotyping. Biostatistics 666

Package HierDpart. February 13, 2019

Evolution of Populations

New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype)

QTL model selection: key players

Individual and population-level responses to ocean acidification

I. GREGOR MENDEL - father of heredity

A mixed model based QTL / AM analysis of interactions (G by G, G by E, G by treatment) for plant breeding

Disentangling the effects of geographic and. ecological isolation on genetic differentiation

What is genetic differentiation, and how should we measure it--gst, D, neither or both?

Analyzing the genetic structure of populations: a Bayesian approach

opulation genetics undamentals for SNP datasets

Chapter 16. Table of Contents. Section 1 Genetic Equilibrium. Section 2 Disruption of Genetic Equilibrium. Section 3 Formation of Species

Transcription:

Lecture 13: Population Structure October 8, 2012

Last Time Effective population size calculations Historical importance of drift: shifting balance or noise? Population structure

Today Course feedback The F-Statistics Sample calculations of F ST Defining populations on genetic criteria

Midterm Course Evaluations Based on five responses: It s not too late to have an impact! Lectures are generally OK Labs are valuable, but better organization and more feedback are needed Difficulty level is OK Book is awful

F-Coefficients Quantification of the structure of genetic variation in populations: population structure Partition variation to the Total Population (T), Subpopulations (S), and Individuals (I) S T

F-Coefficients Combine different sources of reduction in expected heterozygosity into one equation: 1 F = (1 F )(1 F IT ST IS ) Overall deviation from H-W expectations Deviation due to subpopulation differentiation Deviation due to inbreeding within populations

F-Coefficients and IBD View F-statistics as probability of Identity by Descent for different samples 1 F = (1 F )(1 F IT ST IS ) Overall probability of IBD Probability of IBD for 2 individuals in a subpopulation Probability of IBD within an individual

F-Statistics Can Measure Departures from Expected Heterozygosity Due to Wahlund Effect where F ST = H T H H T S H T is the average expected heterozygosity in the total population F IS = H S H H S I H S is the average expected heterozygosity in subpopulations F IT = H T H H T I H I is observed heterozygosity within a subpopulation

Calculating F ST Recessive allele for flower color B 2 B 2 = white; B 1 B 1 and B 1 B 2 = dark pink Subpopulation 1: F(white) = 10/20 = 0.5 F(B 2 ) 1 = q 1 = 0.5 = 0.707 p 1 =1-0.707 = 0.293 White: 10, Dark: 10 Subpopulation 2: F(white)=2/20=0.1 F(B 2 ) 2 = q 2 = 0.1 = 0.32 p 2 = 1-0.32 = 0.68 White: 2, Dark: 18

Calculating F ST Calculate Average H E of Subpopulations (H S ) For 2 subpopulations: H S = Σ2p i q i /2 = (2(0.707)(0.293) + 2(0.32)(0.68))/2 H S = 0.425 White: 10, Dark: 10 Calculate Average H E for Merged Subpopulations (H T ): F(white) = 12/40 = 0.3 q = 0.3 = 0.55; p=0.45 H T = 2pq = 2(0.55)(0.45) H T = 0.495 White: 2, Dark: 18

Bottom Line: F ST = (H T -H S )/H T = (0.495-0.425)/ 0.495 = 0.14 14% of the total variation in flower color alleles is due to variation among populations White: 10, Dark: 10 AND Expected heterozygosity is increased 14% when subpopulations are merged (Wahlund Effect) White: 2, Dark: 18

Nei's Gene Diversity: G ST Nei's generalization of F ST to multiple, multiallelic loci G = ST D H ST T D ST = Where H S is mean H E of m subpopulations, calculated for n alleles with frequency of p j H H S T = 1 H m S m (1 n i= 1 j= 1 p 2 j ) H =1! " P 2 T j Where p j is mean allele frequency of allele j over all subpopulation

Unbiased Estimate of F ST Weir and Cockerham's (1984) Theta Compensates for sampling error, which can cause large biases in F ST or G ST (e.g., if sample represents different proportions of populations) Calculated in terms of correlation coefficients Calculated by FSTAT software: http://www2.unil.ch/popgen/softwares/fstat.htm Goudet, J. (1995). "FSTAT (Version 1.2): A computer program to calculate F- statistics." Journal of Heredity 86(6): 485-486. Often simply referred to as F ST in the literature Weir, B.S. and C.C. Cockerham. 1984. Estimating F-statistics for the analysis of population structure. Evolution 38:1358-1370.

Linanthus parryae population structure Annual plant in Mojave desert is classic example of migration vs drift Allele for blue flower color is recessive Use F-statistics to partition variation among regions, subpopulations, and individuals F ST can be calculated for any hierarchy: F RT : Variation due to differentiation of regions F SR : Variation due to differentiation among subpopulations within regions Schemske and Bierzychudek 2007 Evolution

Linanthus parryae population structure

Hartl and Clark 2007 H S = 1 30 # & 2 "% 1! " p im ( 30 i=1 $ m=1 ' 3 1 # & 2 H R =! N! r % 1"! p rm ( N r r=1 $ m=1 ' r# & 2 H T = 2% 1! " p m ( $ m ' H R H S FSR = H F SR = F RT = F RT = F ST = F ST = R 0.1589! 0.1424 0.1589 H T H H T R 0.2371! 0.1589 0.2371 H T H H T 0.2371! 0.1424 0.2371 S = 0.1036 = 0.3299 = 0.3993

F ST as Variance Partitioning Think of F ST as proportion of genetic variation partitioned among populations V ( q) F = ST pq where V(q) is variance of q across subpopulations Denominator is maximum amount of variance that could occur among subpopulations

Analysis of Molecular Variance (AMOVA) Analogous to Analysis of Variance (ANOVA) Use pairwise genetic distances as response Test significance using permutations Partition genetic diversity into different hierarchical levels, including regions, subpopulations, individuals Many types of marker data can be used Method of choice for dominant markers, sequence, and SNP

Phi Statistics from AMOVA φ CT = σ 2 a σ + σ 2 a 2 b + σ 2 c Correlation of random pairs of haplotypes drawn from a region relative to pairs drawn from the whole population (F RT ) φ SC = 2 σ b σ + σ 2 b 2 c Correlation of random pairs of haplotypes drawn from an individual subpopulation relative to pairs drawn from a region (F SR ) φ ST = 2 2 σ a + σ b σ + σ + σ 2 a 2 b 2 c Correlation of random pairs of haplotypes drawn from an individual subpopulation relative to pairs drawn from the whole population (F ST ) http://www.bioss.ac.uk/smart/unix/mamova/slides/frames.htm

What if you don t know how your samples are organized into populations (i.e., you don t know how many source populations you have)? What if reference samples aren t from a single population? What if they are offspring from parents coming from different source populations (admixture)?

What s a population anyway?

Defining populations on genetic criteria Assume subpopulations are at Hardy-Weinberg Equilibrium and linkage equilibrium Probabilistically assign individuals to populations to minimize departures from equilibrium Can allow for admixture (individuals with different proportions of each population) and geographic information Bayesian approach using Monte- Carlo Markov Chain method to explore parameter space Implemented in STRUCTURE program: Londo and Schaal 2007 Mol Ecol 16:4523

Example: Taita Thrush data* Three main sampling locations in Kenya Low migration rates (radio-tagging study) 155 individuals, genotyped at 7 microsatellite loci Slide courtesy of Jonathan Pritchard

Estimating K Structure is run separately at different values of K. The program computes a statistic that measures the fit of each value of K (sort of a penalized likelihood); this can be used to help select K. Assumed value of K! Posterior probability of K Taita thrush data 1 2 3 4 5 ~0 ~0 0.993 0.007 0.00005

Another method for inference of K The ΔK method of Evanno et al. (2005, Mol. Ecol. 14: 2611-2620): Eckert, Population Structure, 5-Aug-2008 46

Inferred population structure Africans Europeans MidEast Cent/S Asia Asia Oceania America Each individual is a thin vertical line that is partitioned into K colored segments according to its membership coefficients in K clusters." Rosenberg et al. 2002 Science 298: 2381-2385

Inferred population structure regions Rosenberg et al. 2002 Science 298: 2381-2385