AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity,

Similar documents
The Genetics of Natural Selection

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?

8. Genetic Diversity

Question: If mating occurs at random in the population, what will the frequencies of A 1 and A 2 be in the next generation?

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50

Population Genetics I. Bio

Population genetics snippets for genepop

19. Genetic Drift. The biological context. There are four basic consequences of genetic drift:

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Lecture 1 Introduction to Quantitative Genetics

(Write your name on every page. One point will be deducted for every page without your name!)

Outline of lectures 3-6

Lecture 13: Variation Among Populations and Gene Flow. Oct 2, 2006

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

Processes of Evolution

Life Cycles, Meiosis and Genetic Variability24/02/2015 2:26 PM

Case Studies in Ecology and Evolution

Mechanisms of Evolution

Problems for 3505 (2011)

Notes on Population Genetics

Outline of lectures 3-6

CHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection

The neutral theory of molecular evolution

Selection Page 1 sur 11. Atlas of Genetics and Cytogenetics in Oncology and Haematology SELECTION

Evolutionary Genetics Midterm 2008

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Quantitative Genetics I: Traits controlled my many loci. Quantitative Genetics: Traits controlled my many loci

Mathematical modelling of Population Genetics: Daniel Bichener

NOTES CH 17 Evolution of. Populations

The theory of evolution continues to be refined as scientists learn new information.

The Wright-Fisher Model and Genetic Drift

Heterozygosity is variance. How Drift Affects Heterozygosity. Decay of heterozygosity in Buri s two experiments

Name Class Date. KEY CONCEPT Gametes have half the number of chromosomes that body cells have.

Natural Selection results in increase in one (or more) genotypes relative to other genotypes.

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Outline for today s lecture (Ch. 14, Part I)

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

Theory of Natural Selection

Introduction to Natural Selection. Ryan Hernandez Tim O Connor

Neutral Theory of Molecular Evolution

9 Genetic diversity and adaptation Support. AQA Biology. Genetic diversity and adaptation. Specification reference. Learning objectives.

Outline of lectures 3-6

Population Structure

Darwinian Selection. Chapter 7 Selection I 12/5/14. v evolution vs. natural selection? v evolution. v natural selection

Genetics and Natural Selection

Population Genetics & Evolution

Genetic Variation in Finite Populations

A. Correct! Genetically a female is XX, and has 22 pairs of autosomes.

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Enduring Understanding: Change in the genetic makeup of a population over time is evolution Pearson Education, Inc.

Educational Items Section

Segregation versus mitotic recombination APPENDIX

Name: Hour: Teacher: ROZEMA. Inheritance & Mutations Connected to Speciation

Dropping Your Genes. A Simulation of Meiosis and Fertilization and An Introduction to Probability

Lecture 2: Introduction to Quantitative Genetics

URN MODELS: the Ewens Sampling Lemma

- point mutations in most non-coding DNA sites likely are likely neutral in their phenotypic effects.

Lecture 13: Population Structure. October 8, 2012

Breeding Values and Inbreeding. Breeding Values and Inbreeding

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

Mutation, Selection, Gene Flow, Genetic Drift, and Nonrandom Mating Results in Evolution

Introductory Statistics

Meiosis and Mendel. Chapter 6

NEUTRAL EVOLUTION IN ONE- AND TWO-LOCUS SYSTEMS

Introduction to Genetics. Why do biological relatives resemble one another?

Processes of Evolution

Solutions to Problem Set 4

Quantitative Trait Variation

Chapter 11 INTRODUCTION TO GENETICS

Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype.

Darwinian Selection. Chapter 6 Natural Selection Basics 3/25/13. v evolution vs. natural selection? v evolution. v natural selection

- mutations can occur at different levels from single nucleotide positions in DNA to entire genomes.

How robust are the predictions of the W-F Model?

Meiosis. Two distinct divisions, called meiosis I and meiosis II

Meiosis. Two distinct divisions, called meiosis I and meiosis II

Chapter 13 Meiosis and Sexual Reproduction

Conservation Genetics. Outline

Introduction to population genetics & evolution

Lecture 4: Allelic Effects and Genetic Variances. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

Name Period. 2. Name the 3 parts of interphase AND briefly explain what happens in each:

Introductory Applied Bio-Statistics

BIOL Evolution. Lecture 9

2. Map genetic distance between markers

STAT 536: Genetic Statistics

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.

AUTHORIZATION TO LEND AND REPRODUCE THE THESIS. Date Jong Wha Joanne Joo, Author

UNIT V. Chapter 11 Evolution of Populations. Pre-AP Biology

This is DUE: Come prepared to share your findings with your group.

Mechanisms of Evolution Microevolution. Key Concepts. Population Genetics

UNIT 3: GENETICS 1. Inheritance and Reproduction Genetics inheritance Heredity parent to offspring chemical code genes specific order traits allele

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014

STAT 536: Migration. Karin S. Dorman. October 3, Department of Statistics Iowa State University

Levels of genetic variation for a single gene, multiple genes or an entire genome

Study of similarities and differences in body plans of major groups Puzzling patterns:

Unfortunately, there are many definitions Biological Species: species defined by Morphological Species (Morphospecies): characterizes species by

Model Building: Selected Case Studies

Big Idea #1: The process of evolution drives the diversity and unity of life

Name Period. 3. How many rounds of DNA replication and cell division occur during meiosis?

Genetic proof of chromatin diminution under mitotic agamospermy

Microevolution (Ch 16) Test Bank

Transcription:

AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity, Today: Review Probability in Populatin Genetics Review basic statistics Population Definition Random mating and non-ovelapping generations models Hardy-Weinberg Model Look at measures of genetic diversity, following Tuesday s talk Note there are times that there is a question that is left blank, make sure you can answer it after lecture, these are often concepts that are important for a deeper understanding and for you mid-term.

Probability Theory in Population Genetics The PROBABILITY (P) of an event is the number of times the event will occur (a) divided by the total number of possible events (n). P = a/n Multiplicative (Product) Rule : If the events A and B are independent, then the probability that they both occur is P(A and B) = P(A) x P(B) That is, the probability of 2 or more independent events occurring simultaneously is equal to the product of their individual probabilities. For example, the probability of a progeny having the genotype AA at a locus is the frequency of that A allele (denoted as p) in the population x the frequency of that A allele in the population or p 2 Sum Rule: The probability of 2 or more mutually exclusive events occurring is equal to the sum of their individual probabilities: P(A or B) = P(A) + P(B) Using the example above, the frequency of a heterozygote genotype Aa at a locus is the frequency of both alleles in the population multiplied. For example pq. However, there are two ways to get the pq, a p from the mom and a q from the dad, or a q from the mom and a p from the dad. We could write this as pq + qp = 2pq

Conditional probability probability of one event given the other event has occurred. P(A B) = P(A and B) = P(A)*P(B) P(B) P(B) BASIC STATISTICS: Basic Terms: Population = group of things we are interested in (population of inference) Sample = Subset of the population typically it is not possible to sample the total population Random Sample = each member has and equal and independent chance of being in that sample Variable = an attribute common to all members of the population but varies in the realization, and these realizations are called varieties Random variable = is a variable measured on the random sample Continuous variables = metric variable, continuous scales, e.g., height Discrete variable = meristic variable, countable, e.g., # of leaves, # of digits, integers Categorical variable = grouped and discrete but not ordered Example: Categories AA, Aa, aa Discrete number of A alleles Parameter = numerical summary or constants that measure the population of inference describes the entire population Example: 2 is the population variance and is the population mean for a certain trait x1

Statistic = value of this numerical constant calculated on the sample and used to estimate the parameter. Example: s 2 is the variance and x is the mean Summary statistics allows us to compare populations and estimate the parameters. Statistics are divided into 5 categories: Descriptive Tests of difference Tests of relationship Multivariate exploratory methods Estimators of population parameters Central Tendency: Arithmetic Mean n = xi/(n-1) I=1 N = Xi/N I=1 Calculate the average fitness of a population: From your sample of the population categorize individuals into groups: # Genotype Fitness 25 AA 0.7 50 Aa 0.5 25 aa 0.4 (freq. of category)(value of category) (0.25)(0.7)+(0.5)(0.5)+(0.25)(0.4) = average fitness

The measure of variability or dispersion of points around the mean is the variance. 2 = (X- ) 2 /N s 2 = (x- ) 2 /(n-1) Standard deviation is the square root of s 2 - remember that 1 SD is 68% of the central area and 2 SD is 95% of the central area. Do not confuse SE with SD SD is the probability distribution of the underlying raw data of a parameter and SE is the measure of the dispersion of a sample statistic. For example: SE describes the distribution of the sample mean heterozygosity while the SD describes the sampling distribution of the raw parameter heterozygosity. Geometric mean average of the product of numbers, used in growth rate estimates Harmonic mean weighted for the smallest size, used in calculating the effective population size

POPULATIONS: Group of organisms (species) living within a sufficiently restricted geographic area with random mating Local interbreeding population Local population or demes (Mendelian populations or Subpopulations)

THE MODEL OF RANDOM MATING: P(AA) P(aa) P(Aa) Parent Population a A a A A a A a a A Allele Pool P (AA) P (AA) P (AA) New Population genotype frequencies

NON-OVERLAPPING GENERATIONS Mostly insects and plants. While simple, the model works for a lot of organisms with complex life-histories: generation generation generation t-1 t t+1 HARDY-WEINBERG MODEL GH Hardy & W Weinberg 1908 (independently) WE Castle (1903 Harvard geneticist) Assumptions of HW Principal 1. Diploid population (2N) 2. Sexual reproduction no selfing 3. Non-overlapping generations 4. Locus with 2 alleles 5. Allele frequencies are equal in males and females 6. Random mating 7. Infinite population size 8. Mutation ignored 9. Natural Selection doesn t affect alleles considered

Model with Theoretical Predictions Gen 1 Gen 2 Time p = frequency of A allele q = frequency of a allele p+q = 1 Independent trials (pa + qa)*(pa + qa) = 1 (all genotypes) So p 2 +2pq+q 2 =1 (1) Equilibrium allele frequencies, after one round of random mating p or p 2 is equal to p and p 2 (2) What about random union of gametes?

EXAMPLE: If we have a single locus with two alleles, A1 and A2 Let: p = frequency of A1 allele q = frequency of A2 allele What are the three possible genotypes? The allele frequencies can be estimated from the genotype frequencies: Now if there is random mating what is the frequency of genotypes in the next generation? What are the progeny genotypes given the adult genotypes and random mating? Frequency of zygotes (progeny) Mating Genotype Frequency A1A1x A1A1 P 2 1 0 0 A1A1xA1A2 2PQ ½ ½ 0 A1A1xA2A2 2PR 0 1 0 A1A2xA1A2 Q 2 ¼ ½ ¼ A1A2xA2A2 2QR 0 ½ ½ A2A2xA2A2 R 2 0 0 1 P Q R New P +Q +R =1 genotypes P = P 2 + 2PQ 2 4 p2 Q = 2PQ 2 + 2PR + Q2 2 + 2QR 2 = = 2pq R = R 2 + 2QR 2 + Q2 4 = = q2 A1A1 A1A2 A2A2 For extra credit on your homeowrk this week, can you prove the connection of the equation for P to p 2, Q to 2pq, and R to q 2?

EXAMPLE

Measures of Genetic Diversity - Allozyme Data There are two standard measures of allozyme diversity (1) P, the proportion of loci sample that are polymorphic P = x/m x is the number of polymorphic loci in a sample of m loci Note: Often you ll see this measure as a measure of diversity for allozyme loci, but because of sampling (low sample numbers may have loci that appear monomorphic, but are polymorphic with more individuals in the sample, see below), this is not a good measure for highly polymorphic loci. (2) H, mean Heterozygosity Sample a locus with two alleles at frequencies of 0.4 and 0.6 Let p1=0.4 and p2=0.6 Homozygotes p1 2 =0.16; p2 2 =0.36 Therefore 1-(0.16+0.36)= 0.48 (48% heterozygote) Average over all loci including monomorphic ones! General equation (Nei 1987) Unbiased estimate Measures of Genetic Diversity Allozymes Data Note: The general equation for expected heterozygosity is often referred to as a measure of diversity. We use this equation for more than just allozymes, and it s fundamental to understand for measuring divergences among populations (Fstatistics). I like to think of the measure as the probability of an individual being heterozygous at a given locus. Many human microsatellite loci are >0.85, which means you have a >85% chance of being heterozygous at this locus. I ll break down the equation here and we will talk about it more in class In the equation above the pi is the ith allele of n alleles at a locus. For example p1, p2, p3 could correspond to p, q, r,

Remember the HW proportions equation p 2 + 2pq + q 2 = 1, then this follows: Rearrange the above equation = p 2 + q 2 + 2pq = 1 Solve for heterzygotes = 2pq = 1 (p 2 + q 2 ) If you think about a situation, which could be true for many loci, that alleles are 4 or more, it becomes much easier to take the sum of the homozygous rather than the heterozygous genotype combinations. For example if you have 6 alleles, there are a possible 21 genotypes: A(A + 1) 2 = 6 7 2 = 21 Of this 21 possible there are only 6 kinds of homozygous genotypes (A1A1, A2A2, A3A3, etc. etc.) but there are 15 different heterozygous genotypes. As you increase it is easier to just square the homozygous individuals to calculate the heterozygosity frequency. Heterozygosity = 1 (sum of all the homozygous frequencies)

Measures of Genetic Diversity Microsatellite Data There are 4 standard measures of microsatellite diversity (1) P, the proportion of loci sample that are polymorphic P = x/m x is the number of polymorphic loci in a sample of m loci (2) HE Expected heterozygosity (Nei 1987) general measure of genetic diverisity Problem- high diversity because of high mutation rate 100 Average number of alleles captured (all loci combined) 90 80 70 60 50 40 30 20 10 SM BC MB FB 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 Sample Size (2N) (3) A - Allele number- more sensitive to loss of genetic variation # of alleles per locus at each population (4) Rg - Allelic Richness Samples alleles at individual loci at the same sample size among populations using a rarefaction method to estimate allelic richness. The sub g is the number of genes sampled.

Locus m.2 Repeat Number 11 12 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Total Locations Big Creek Adults 0 0 0 0 0 8 2 1 12 2 19 5 6 8 6 5 0 2 0 76 Monterey Bay Adults 0 0 0 2 0 4 1 0 5 3 7 5 2 2 5 0 2 0 0 38 Fort Bragg Adults 1 0 1 0 0 14 3 0 14 3 24 6 14 14 11 7 2 0 0 114 San Miguel Is. Adults 1 0 4 0 1 18 7 1 15 3 15 9 20 11 7 9 1 1 1 124 Fort Ross Juveniles 0 1 2 0 1 19 5 0 61 25 31 10 8 25 18 4 3 2 0 215 Monterey Bay Juvenil 0 0 32 2 4 73 68 6 107 15 57 45 103 74 33 18 3 4 8 652 Carmel Bay Juveniles 0 0 4 0 0 11 8 2 14 3 6 4 12 10 3 2 0 0 1 80 Total 2 1 43 4 6 147 94 10 228 54 159 84 165 144 83 45 11 9 10 1299 Unique All. # of Allele 0 12 1 11 0 13 2 17 1 16 1 17 0 13 5 99 Allele Number (A) = #alleles in pop Big Creek Adults Locus m2 = 12 Monterey juveniles Locus m2 = 17 Big difference in population size! Allelic richness (Rg) measures # of alleles using sample of N individuals of the smallest population size for all loci (N=38)

Measures of Genetic Variation Using Sequence Data 1. Nucleotide Diversity - π π = (n/n-1)σxixjπij xi = is the frequency of that haplotype divided by total number of haplotypes n/(n-1) = (n/n-1) = n is the # of alleles in gene, sampling error term πij = proportion of nucleotides that differ between type I and type j 2. The number of segregation sites θ (Theta) Infinite-alleles model θ = 4NEμ S = np/nt the number of polymorphic sites over total number of sites Here is how we estimate θ Which we can rearrange to be θ = S/a1 At Steady State in the infinite-alleles method π = θ

Estimating π and θ from DNA Sequence Data An Example -We collected a sample of 5 banana slugs from the woods outside of UC Santa Cruz campus in California -We sequence 500 bp region of the mitochondrial COI gene and observe 5 segregating sites in four distinct haplotypes Nucleotide site in gene N 4 45 345 398 456 Haplotype 1 2 T G T C T Haplotype 2 1 T A T T A Haplotype 3 1 C G T C T Haplotype 4 1 C G G C T 1. Proportion of polymorphic sites - (referred to as P or S) 2. Nucleotide diversity - π π = (n/n-1)σxixjπij n = 5, the number of polymorphic sites, therefore n/n-1 = 5/4 Frequency Hap1 0.4 (note that there are 2 Haplotype 1s) Hap2 0.2 Hap3 0.2 Hap4 0.2 Pairwise Diff. Hap1&Hap2 0.006 (3 pairwise differences out of 500 possible) Hap1&Hap3 0.002 Hap1&Hap4 0.004 Hap2&Hap3 0.008 Hap2&Hap4 0.01 Hap3&Hap4 0.002

Make a matrix to sum Hap (i) Hap (j) xi xj πij xixjπij 1 1 0.4 0.4 0 0 1 2 0.4 0.2 0.006 0.00048 1 3 0.4 0.2 0.002 0.00016 1 4 0.4 0.2 0.004 0.00032 2 1 0.2 0.4 0.006 0.00048 2 2 0.2 0.2 0 0 2 3 0.2 0.2 0.008 0.00032 2 4 0.2 0.2 0.01 0.0004 3 1 0.2 0.4 0.002 0.00016 3 2 0.2 0.2 0.008 0.00032 3 3 0.2 0.2 0 0 3 4 0.2 0.2 0.002 0.00008 4 1 0.2 0.4 0.004 0.00032 4 2 0.2 0.2 0.01 0.0004 4 3 0.2 0.2 0.002 0.00008 4 4 0.2 0.2 0 0 Σ 0.00352 π = (n/n-1)σxixjπij π = 5/4*(0.00352) = 0.0044

Estimating π and θ from DNA Sequence Data -We collected a sample of 5 banana slugs from the woods outside of UC Santa Cruz campus in California -We sequence 500 bp region of the mitochondrial COI gene and observe 5 segregating sites in four distinct haplotypes Nucleotide site in gene N 4 45 345 398 456 Haplotype 1 2 T G T C T Haplotype 2 1 T A T T A Haplotype 3 1 C G T C T Haplotype 4 1 C G G C T 3. Segregating Sites θ S = np/nt θ = S/a1 S = # segregating sites/total number of sites analyzed = n S = 5/500 = 0.01 a1 = 1/1+1/2+ 1/n-1 = 1/1 + 1/2 + 1/3 +1/4= 2.083 Note: a1 = # of alleles, in the example above you have 5 alleles or segregating sites and you divide by starting at 1 to n-1 to calcuated a1. θ = S/ a1 = 0.010/2.083 = 0.0048 Notice that both estimates of nucleotide diversity are similar π = θ which indicated steady state