26 th Oct 2015 Poulation Genetics II (Selection + Halotye analyses) Gurinder Singh Mickey twal Center for Quantitative iology
Natural Selection Model (Molecular Evolution) llele frequency Embryos Selection dults llele frequency One generation
Examle of natural selection in mice Day 5 after fertilization of egg 53+/+ 53-/- 53-/- +LIF injection Imlantation sites Hu et al (2007) Genotye of C57L/6J mice Male Female LIF injection Imlantation sites (verage±se) Number of recovered blastocysts (verage±se) +/+ +/+ - 8.4±0.5 0 5 -/- -/- - 2.7±0.8 3.2 ±0.6 6 -/- -/- + 7±0.8 0.6±0.6 3 n
Hardy Weinberg Law Consider 2 alleles (,a) with frequency llele frequency of = llele frequency of a = q = 1- Randomly-mating large diloid oulation with no mutation, migration, selection and drift Genotye a aa Hardy- Weinberg Frequency 2 2q q 2
Fitness Genotye a aa Newborn frequency 2 2q q 2 Fitness w w a w aa Relative fitness Frequency after selection w w = 1 w w a 2 s q 1 2 w 1 hs 2 q 1 w w waa = 1 hs = 1 s w s = selection coefficient (relative viability of over aa) h = heterozygous effect w = mean relative fitness
Mean Relative Fitness of Poulation 2 mean fitness = w = w + 2qw + a q 2 w aa mean relative fitness = w = w w w 2 = 1 2qhs q s w "1 Genetic Load = L = 1- w 0 " L "1
Heterozygous advantage h=0 dominant, a recessive h=1 recessive, a dominant 0<h<1 incomlete dominance h<0 overdominance h>1 underdominance h determines the equilibrium allele frequency s determines how fast the equilibrium is achieved
Fundamental Theorem of Natural Selection R. Fisher, 1958 Change of mean fitness is roortional to additive genetic variance w' w = qs 2w 2 w ' =fitness in next generation
Tyes of selection Directional selection (0<h<1) causes to go to 1 conventional Darwinian natural selection alancing selection (h<0) cause to go to some equilibrium value e e.g. heterozygous variant of H gene confers resistance to malaria athogen (Plasmodium falciarum) Disrutive selection (h>1) if < e then goes to 0 if > e then goes to 1
Examle of human directional selection P C Sabeti et al. Science 2006;312:1614-1620 The FY * O allele in the romotor gene of Duffy antigen gene, which confers resistance to Plasmodium vivax malaria, is revalent and even fixed in many frican oulations
What about drift? Very imortant in small oulations. Deends on relative ratios of s and 1/2N e.g. allele has a selective advantage over allele a with selection coefficient s w w aa = 1 s In an initial oulation entirely consisting of aa genotyes, robability of new mutant fixing In an initial oulation entirely consisting of genotyes, robability of new mutant a fixing = 1 1 e e e e s = 2Ns s 2Ns 1 1 > 0 Therefore, even deleterious alleles can fixate in a small oulation!
Detecting Natural Selection in the Human Genome e.g. McDonald- Kreitman test e.g. Tajima D test P C Sabeti et al. Science 2006;312:1614-1620 Choice of selection test deends on the time scale of evolution
HPLOTYPE STUDIES
Halotye Ø Sequence of contiguous SNP alleles on a chromatid Ø Hard to determine directly across whole genome Ø Usually only the genotyes are rovided, giving ambiguous halotyes Ø Halotyes usually inferred ( hased ) by statistical comutation Ø Newer exerimental methods can directly hase halotyes, but are costly
Tyical Results of Genotye ssays SNPS 1 2 3 4 5 6 7 8 9 10 Cell Lines / Patients 6023 T/T G/G / C/C / / C/C G/G C/C G/G 6031 T/T G/G / C/C / / C/G G/G C/C G/G 6032 C/C / C/C T/T C/C C/C C/G / G/G / 6033 C/T /G /C C/T /C /C C/G /G C/G /G 6034 T/T G/G / C/C / / C/G G/G C/C G/G 6046 T/T G/G / C/C / / C/G G/G C/C G/G 6047 C/T /G /C C/T /C /C C/C /G C/G /G 6048 C/T /G /C C/T /C /C C/G /G C/G /G 6053 C/C / / T/T C/C C/C C/G / G/G / 6054 T/T G/G / C/C / / C/G G/G C/C G/G 6055 C/T /G /C failed /C /C C/G /G C/G /G 6056 C/T /G /C C/T /C /C C/G /G C/G /G 6057 C/T /G /C C/T /C /C C/G /G C/G /G 6060 C/T /G /C C/T /C /C failed /G C/G /G 6061 C/C / C/C T/T C/C C/C C/G / G/G / 6067 T/T G/G / C/C / / C/C G/G C/C G/G
Linkage Disequilibrium Ø Linkage Disequilibrium (LD) = correlation of nucleotide alleles at different loci across the oulation l On average, there is strong LD between nearby alleles on the same chromosome Ø Linkage Equilibrium = random association (indeendence) of alleles at different loci across the oulation Ø LD reflects many factors of oulation history Ø LD ermits us to use roxy SNPs as diagnostic biomarkers for disease-causing mutations
Poulation history and SNP correlations Present day chromosomes Mutations occurring at various times of oulation history Neutral mutation Disease mutation resent time ast
New halotyes generated by mutations and C Locus 1 Locus 2 T ncestral chromosome with two loci shown C T T Mutation at locus 1 C C T T G Mutation at locus 2 on ancestral chromosome
intra-chromosomal recombination efore recombination C C T T G Halotye 1 Halotye 2 Halotye 3 fter recombination C C T T G G recombination between halotyes 2 and 3 generates a new halotye from existing mutations
Quantifying linkage disequilibrium Ø From the oulation halotye frequencies we can calculate the correlations between SNPs. Ø Commonly used LD summaries l D l Lewontin s D l r 2
Halotye frequencies Halotye with 2 SNPs /a /b LOCUS 2 llele llele b Totals llele b LOCUS 1 llele a a ab a Totals b 1.0
Linkage Equilibrium definition ) )(1 (1 ) (1 ) (1 b a ab a a b b = = = = Random association of alleles Exected for SNPs at distant loci
Linkage Disequilibrium definition ) )(1 (1 ) (1 ) (1 b a ab a a b b Non-random association of alleles Exected for SNPs at nearby loci
LD measure : D Deviation from linkage equilibrium D = Thus it can be shown that all 4 of the 2-SNP halotye frequencies can be exressed in terms of D, and only. i.e., a b ab = + D = (1 ) D = (1 ) D = (1 )(1 ) + D Note also, D = ab b a
LD measure : Lewontin s D Normalized version of D: D ' = D D max where D max is given by D = max min[ min[ b,, a a b ] ] if D>0 if D<0 D ranges between -1 and 1 directly related to recombination fraction D=0 if linkage equilibrium D =1 if only 2 or 3 halotyes are resent out of the ossible 4 D uwardly biased in small samles
LD measure : r 2 Square of the correlation coefficient r 2 = D a 2 b ranges between 0 and 1 useful in association maing r 2 =0 if linkage equilibrium r 2 =1 if only 2 halotyes are resent roortional to mutual information between 2 loci when D small
Factors affecting Linkage Disequilibrium Increases LD Ø Finite Samling (Drift) Ø Demograhic bottleneck Ø Selection Ø Emigration Decreases LD Ø Immigration Ø Recombination decreases number (or variability) of halotyes increases number (or variability) of halotyes
How does LD decay over time? Ø Recombination reduces correlation between SNPs Halotye frequencies at time t P a a b b P b P a P ab
Decay of linkage disequilibrium in large oulation Ø The frequency of in the new generation (time t+1) will deend on the frequencies of, a, and b in the old generation (time t) and also the recombination rate, c t+ 1 = = = (1 (1 t c) c) t t cd t + + c c t ( t t D ) t Therefore, D t+ 1 = D t (1 c) D t+ n = D t (1 c) n D t ex( cn) (at large times)
Different oulations exhibit characteristic LD decay across the genome 1 0.9 Caucasian frican-merican sian Yoruban 0.8 0.7 Mean D' 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50,000 100,000 150,000 Distance (b) Gabriel et al, 2002
Finite oulation size : Recombination-Drift Equilibrium Ø Rate of decay of LD by recombination is cancelled out by rate of increase of LD by drift r 2! 1 1+ 4N e cd N e = effective oulation size (~10,000 for humans) c = recombination rate (er base-air) d = distance across genome (base-airs) 1 N e = 1 T! # " 1 + 1 +... + 1 N 1 N 2 N T $ & % Note that N e will be dominated by the times when oulation sizes are reduced (oulation bottleneck)