ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG Human Population Genomics

Similar documents
Mechanisms of Evolution Microevolution. Key Concepts. Population Genetics

Genetical theory of natural selection

Functional divergence 1: FFTNS and Shifting balance theory

Neutral Theory of Molecular Evolution

Application Evolution: Part 1.1 Basics of Coevolution Dynamics

CHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection

Classical Selection, Balancing Selection, and Neutral Mutations

Natural Selection. DNA encodes information that interacts with the environment to influence phenotype

Darwinian Selection. Chapter 7 Selection I 12/5/14. v evolution vs. natural selection? v evolution. v natural selection

Population Genetics II (Selection + Haplotype analyses)

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Evolution by Natural Selection

BIG IDEA 4: BIOLOGICAL SYSTEMS INTERACT, AND THESE SYSTEMS AND THEIR INTERACTIONS POSSESS COMPLEX PROPERTIES.

Chapter 7: Covalent Structure of Proteins. Voet & Voet: Pages ,

A DISEASE ECOLOGIST S GUIDE TO EVOLUTION: EVIDENCE FROM HOST- PARASITE RELATIONSHIPS

p(d g A,g B )p(g B ), g B

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

(Write your name on every page. One point will be deducted for every page without your name!)

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

AGREE or DISAGREE? What s your understanding of EVOLUTION?

Genotype Imputation. Biostatistics 666

Department of Forensic Psychiatry, School of Medicine & Forensics, Xi'an Jiaotong University, Xi'an, China;

There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page.

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Natural Selection. Population Dynamics. The Origins of Genetic Variation. The Origins of Genetic Variation. Intergenerational Mutation Rate

Perplexing Observations. Today: Thinking About Darwinian Evolution. We owe much of our understanding of EVOLUTION to CHARLES DARWIN.

7. Tests for selection

Evolu&on, Popula&on Gene&cs, and Natural Selec&on Computa.onal Genomics Seyoung Kim

Migration In evolutionary terms, migration is defined as movement that will result in gene flow, or the movement of genes from one place to another

Study of similarities and differences in body plans of major groups Puzzling patterns:

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

Molecular Population Genetics

Gene expression differences in human and chimpanzee cerebral cortex

BIOL Evolution. Lecture 9

Introduction to Advanced Population Genetics

THE EVOLUTION OF POPULATIONS THE EVOLUTION OF POPULATIONS

Processes of Evolution

7.36/7.91 recitation CB Lecture #4

List the five conditions that can disturb genetic equilibrium in a population.(10)

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Evolution of Populations. Chapter 17

REVIEW 6: EVOLUTION. 1. Define evolution: Was not the first to think of evolution, but he did figure out how it works (mostly).

Introduction to Linkage Disequilibrium

Genetic Drift in Human Evolution

Affected Sibling Pairs. Biostatistics 666

Genetics and Natural Selection

Life Cycles, Meiosis and Genetic Variability24/02/2015 2:26 PM

Problems for 3505 (2011)

Population Genetics I. Bio

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B

GREENWOOD PUBLIC SCHOOL DISTRICT Genetics Pacing Guide FIRST NINE WEEKS Semester 1

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17

THE OHIO JOURNAL OF SCIENCE

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

Cell Division. Use the following information to answer the next question. Use the following information to answer the next question

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

The problem Lineage model Examples. The lineage model

SNP Association Studies with Case-Parent Trios

Learning gene regulatory networks Statistical methods for haplotype inference Part I

Lecture Notes: BIOL2007 Molecular Evolution

Population Structure

D. Incorrect! That is what a phylogenetic tree intends to depict.

Quantitative Trait Variation

1 Errors in mitosis and meiosis can result in chromosomal abnormalities.

Biology 644: Bioinformatics

1. What is the definition of Evolution? a. Descent with modification b. Changes in the heritable traits present in a population over time c.

How does natural selection change allele frequencies?

Microevolution and Macroevolution

Evidence of Evolution

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Breeding Values and Inbreeding. Breeding Values and Inbreeding

Biology Eighth Edition Neil Campbell and Jane Reece

This course is about VARIATION: its causes, effects, and history.

Choose the strongest accurate answer

1. Natural selection can only occur if there is variation among members of the same species. WHY?

Observation: we continue to observe large amounts of genetic variation in natural populations

Is there any difference between adaptation fueled by standing genetic variation and adaptation fueled by new (de novo) mutations?

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Solutions to Problem Set 4

Case-Control Association Testing. Case-Control Association Testing

Mechanisms of Evolution

Biological basis of life and Mendel

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

Genotype Imputation. Class Discussion for January 19, 2016

Calculation of IBD probabilities

Effective population size and patterns of molecular evolution and variation

New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype)

Variance Component Models for Quantitative Traits. Biostatistics 666

Ecology and Evolutionary Biology 2245/2245W Exam 3 April 5, 2012

Population Genetics of Selection

Choose the strongest accurate answer

Protein Architecture V: Evolution, Function & Classification. Lecture 9: Amino acid use units. Caveat: collagen is a. Margaret A. Daugherty.

Intraspecific gene genealogies: trees grafting into networks

1.A- Natural Selection

Q Expected Coverage Achievement Merit Excellence. Punnett square completed with correct gametes and F2.

Ch 11.4, 11.5, and 14.1 Review. Game

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Transcription:

ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG 010101100010010100001010101010011011100110001100101000100101 Human Population Genomics

Heritability & Environment Feasibility of identifying genetic variants by risk allele frequency and strength of genetic effect (odds ratio). TA Manolio et al. Nature 461, 747-753 (2009) doi:10.1038/nature08494

Global Ancestry Inference G 1,, G N ; G i = g i1 g in ; g ij {0, 1, 2} Nature. 2008 November 6; 456(7218): 98 101.

Modeling population haplotypes VLMC 0000 0001 0011 0110 1000 1001 1011 G 1,, G N ; G i = g i1 g in ; g ij {0, 1, 2} G i = H i1 + H i2, where, H i = h ij1 h ijn ; h ijk {0, 1} Browning, 2006

Phasing Browning & Browning, 2007

Identity By Descent...... { {

IBD detection IBD = F IBD = T Parente Rodriguez et al. 2013 FastIBD: sample haplotypes for each individual, check for IBD Browning & Browining 2011

Mexican Ancestry The genetics of Mexico recapitulates Native American substructure and affects biomedical traits, Moreno-Estrada et al. Science, 2014.

Population Sequencing C = 2-6x A/T C/C C/G 1,000 to A/A C/T G/G 1,000,000 A/A T/T C/G A/A C/T G/G

Population Sequencing C = 2-6x A/T C/C C/G 1,000 to A/A C/T G/G 1,000,000 A/A T/T C/G A/A C/T G/G G 1,, G N ; G i = g i1 g in ; g ij {0, 1, 2} P 1,, P N ; P i : [ p ijk = Prob(g ij = k data) ]

Population Sequencing When C is high (>30x), 1,000 to A/T A/A C/C C/T C/G G/G 2-6x Prob(g ij = k data) ~ Prob(g ij = k reads mapping on (i, j)) fast & easy 1,000,00 0 A/A A/A T/T C/T C/G G/G G 1,, G N ; G i = g i1 g in ; g ij {0, 1, 2} P 1,, P N ; P i : [ p ijk = Prob(g ij = k data) ] When C is low, Prob(g ij = k data) needs to leverage LD: positions j j in all individuals in principle, intractable

Population Sequencing Summarization - Maximization: 1,000 to A/T A/A C/C C/T C/G G/G 2-6x 1. Identify candidate polymorphic sites 2. Initialize G (0) 3. Summarization: 1,000,00 0 A/A A/A T/T C/T C/G G/G G 1,, G N ; G i = g i1 g in ; g ij {0, 1, 2} P 1,, P N ; P i : [ p ijk = Prob(g ij = k data) ] p (n+1) ijk = Prob(g ij = K G(n), data) 4. Maximization: g (n+1) ijk = argmax p(n+1) ijk 5. Repeat until convergence

Modeling LD: Nearest Neighbors i j Sample 1 Sample 2 Sample 3 Sample 4 1 1 1 0 3 5 0 2 l Let S i = { samples with >= one read covering minor allele } S i = {1, 2, 3, 10} S j = {1, 3, 4} Sample 10 1 0 Then, Sim 1 (i, j) = (S i S j ) / (S i U S j ) = 2/4

Reveel Algorithm Summarization Maximization: 1,000 to A/T A/A C/C C/T C/G G/G 2-6x 1. Identify candidate polymorphic sites 2. Initialize G (0) 3. Summarization: 1,000,000 A/A T/T C/G A/A C/T G/G G 1,, G N ; G i = g i1 g in ; g ij {0, 1, 2} P 1,, P N ; P i : [ p ijk = Prob(g ij = k data) ] p (n+1) ijk = Prob(g ij = K G(n), data) 4. Maximization: g (n+1) ijk = argmax p(n+1) ijk 5. Repeat until convergence Candidate Polymorphic site Essentially, pos n j where some individuals have at least 2 reads with same minor allele

Reveel Algorithm Summarization Maximization: 1,000 to A/T A/A C/C C/T C/G G/G 2-6x 1. Identify candidate polymorphic sites 2. Initialize G (0) 3. Summarization: 1,000,000 A/A T/T C/G A/A C/T G/G G 1,, G N ; G i = g i1 g in ; g ij {0, 1, 2} P 1,, P N ; P i : [ p ijk = Prob(g ij = k data) ] p (n+1) ijk = Prob(g ij = K G(n), data) 4. Maximization: g (n+1) ijk = argmax p(n+1) ijk 5. Repeat until convergence At each position j, Use sum of read counts at j and its nearest neighbors

Reveel Algorithm: calculate P (n+1) p (n+1) ijk = P(g ij = k G(n), reads) 1,000 to A/T A/A C/C C/T C/G G/G 2-6x ~ P(g ij = k g knn, reads) = P(reads g ij = k) P(g ij = k g knn) ) 1,000,000 A/A T/T C/G A/A C/T G/G G 1,, G N ; G i = g i1 g in ; g ij {0, 1, 2} P 1,, P N ; P i : [ p ijk = Prob(g ij = k data) ] P(reads g ij = k) : easy P(g ij = k g knn ) = Let C 0, C 1, C 2 = # samples matching i in knn at (n), with j th genotype pos n = 0, 1, 2 Then, P(g ij = k g knn ) = C k / (C 0 + C 1 + C 2 )

Reveel Algorithm Summarization Maximization: 1,000 to A/T A/A C/C C/T C/G G/G 2-6x 1. Identify candidate polymorphic sites 2. Initialize G (0) 3. Summarization: 1,000,000 A/A T/T C/G A/A C/T G/G G 1,, G N ; G i = g i1 g in ; g ij {0, 1, 2} P 1,, P N ; P i : [ p ijk = Prob(g ij = k data) ] p (n+1) ijk = Prob(g ij = K G(n), data) 4. Maximization: g (n+1) ijk = argmax p(n+1) ijk 5. Repeat until convergence At each position j, Use sum of read counts at j and its nearest neighbors

Fixation, Positive & Negative Selection How can we detect negative selection? Negative Selection How can we detect positive selection? Neutral Drift Positive Selection

How can we detect positive selection? Ka/Ks ratio: Ratio of nonsynonymous to synonymous substitutions Very old, persistent, strong positive selection for a protein that keeps adapting Examples: immune response, spermatogenesis

How can we detect positive selection?

Positive Selection in Human Lineage

Positive Selection in Human Lineage

Long Haplotypes EHS, ihs tests Less time: Fewer mutations Fewer recombinations

Application: Malaria Study of genes known to be implicated in the resistance to malaria. Infectious disease caused by protozoan parasites of the genus Plasmodium Frequent in tropical and subtropical regions Transmitted by the Anopheles mosquito Slide Credits: Image source: wikipedia.org Marc Schaub

Application: Malaria Image source: Slide Credits: NIH - http://history.nih.gov/exhibits/bowman/images/malariacyclebig.jpg Marc Schaub

Application: Malaria Image source: CDC - http://www.dpd.cdc.gov/dpdx/images/parasiteimages/m-r/malaria/ malaria_risk_2003.gif Slide Credits: Marc Schaub

Results: G6PD Source: Sabeti et al. Nature 2002. Slide Credits: Marc Schaub

Results: TNFSF5 Source: Sabeti et al. Nature 2002. Slide Credits: Marc Schaub

Malaria and Sickle-cell Anemia Allison (1954): Sickle-cell anemia is limited to the region in Africa in which malaria is endemic. Distribution of malaria Distribution of sickle-cell anemia Slide Credits: Image source: wikipedia.org Marc Schaub

Malaria and Sickle-cell Anemia Single point mutation in the coding region of the Hemoglobin-B gene (glu val). Heterozygote advantage: Resistance to malaria Slight anemia. Image source: wikipedia.org Slide Credits: Marc Schaub

Lactose Intolerance Source: Ingram and Swallow. Population Genetics of Encyclopedia of Life Slide Credits: Sciences. 2007. Marc Schaub

Lactose Intolerance LCT, 5 LCT, 3 Source: Bersaglieri et al. Am. J. Hum. Genet. 2004. Slide Credits: Marc Schaub

Positive Selection in Human Lineage