Adaptation in the Human Genome. HapMap. The HapMap is a Resource for Population Genetic Studies. Single Nucleotide Polymorphism (SNP)

Similar documents
7. Tests for selection

Processes of Evolution

Dietary change and adaptive evolution of enamelin in humans and among primates. Joanna L. Kelley* and Willie Swanson. Department of Genome Sciences

Genetic Drift in Human Evolution

Lecture Notes: BIOL2007 Molecular Evolution

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

WHOLE-GENOME scans make it possible to systematically

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

Classical Selection, Balancing Selection, and Neutral Mutations

Genotype Imputation. Class Discussion for January 19, 2016

(Write your name on every page. One point will be deducted for every page without your name!)

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Understanding relationship between homologous sequences

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Neutral Theory of Molecular Evolution

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Bioinformatics. Genotype -> Phenotype DNA. Jason H. Moore, Ph.D. GECCO 2007 Tutorial / Bioinformatics.

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Full file at CHAPTER 2 Genetics

Molecular Evolution & the Origin of Variation

Molecular Evolution & the Origin of Variation

Heredity and Genetics WKSH

Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION

Robust demographic inference from genomic and SNP data

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Genotype Imputation. Biostatistics 666

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

CHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection

Lesson 4: Understanding Genetics

p(d g A,g B )p(g B ), g B

Exam 1 PBG430/

Designer Genes C Test

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution. Classical vs. balanced views of genome structure

BIOINFORMATICS. Discrepancies in dbsnp confirmation rates and allele frequency distributions from varying genotyping error rates and patterns

Linear Regression (1/1/17)

BIOLOGY STANDARDS BASED RUBRIC

Processes of Evolution

2. Der Dissertation zugrunde liegende Publikationen und Manuskripte. 2.1 Fine scale mapping in the sex locus region of the honey bee (Apis mellifera)

Bio/Life: Cell Biology

Cladistics and Bioinformatics Questions 2013

Learning ancestral genetic processes using nonparametric Bayesian models

D. Incorrect! That is what a phylogenetic tree intends to depict.

Curriculum Links. AQA GCE Biology. AS level

5/4/05 Biol 473 lecture

1. Understand the methods for analyzing population structure in genomes

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

Evolu&on, Popula&on Gene&cs, and Natural Selec&on Computa.onal Genomics Seyoung Kim

Estimating Evolutionary Trees. Phylogenetic Methods

Peddie Summer Day School

Calculation of IBD probabilities

Computational Biology: Basics & Interesting Problems

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

Introduction to Linkage Disequilibrium

Big Idea #1: The process of evolution drives the diversity and unity of life

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Coding sequence array Office hours Wednesday 3-4pm 304A Stanley Hall

Chapter 17. From Gene to Protein. Biology Kevin Dees

Miller & Levine Biology

Introduction to Advanced Population Genetics

Levels of genetic variation for a single gene, multiple genes or an entire genome

Objective 3.01 (DNA, RNA and Protein Synthesis)

Genomes and Their Evolution

Biology Semester 2 Final Review

Gene expression in prokaryotic and eukaryotic cells, Plasmids: types, maintenance and functions. Mitesh Shrestha

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Darwinian Selection. Chapter 7 Selection I 12/5/14. v evolution vs. natural selection? v evolution. v natural selection

Unit 3 - Molecular Biology & Genetics - Review Packet

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

Introduction to Natural Selection. Ryan Hernandez Tim O Connor

Lecture 18 - Selection and Tests of Neutrality. Gibson and Muse, chapter 5 Nei and Kumar, chapter 12.6 p Hartl, chapter 3, p.

A novel method with improved power to detect recombination hotspots from polymorphism data reveals multiple hotspots in human genes

Life Cycles, Meiosis and Genetic Variability24/02/2015 2:26 PM

Campbell Biology Concepts & Connections 2015

Mutation, Selection, Gene Flow, Genetic Drift, and Nonrandom Mating Results in Evolution

Gene expression differences in human and chimpanzee cerebral cortex

POPULATION GENETICS Biology 107/207L Winter 2005 Lab 5. Testing for positive Darwinian selection

Cell Division: the process of copying and dividing entire cells The cell grows, prepares for division, and then divides to form new daughter cells.

Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype.

RNA Processing: Eukaryotic mrnas

Basic Biology. Content Skills Learning Targets Assessment Resources & Technology

What can sequences tell us?

GO ID GO term Number of members GO: translation 225 GO: nucleosome 50 GO: calcium ion binding 76 GO: structural

Biology Final Review Ch pg Biology is the study of

Biology Scope and Sequence Student Outcomes (Objectives Skills/Verbs)


COMPARATIVE PRIMATE GENOMICS


UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

1 Errors in mitosis and meiosis can result in chromosomal abnormalities.

Introduction to Bioinformatics

Untitled Document. A. antibiotics B. cell structure C. DNA structure D. sterile procedures

Theory a well supported testable explanation of phenomenon occurring in the natural world.

Formative/Summative Assessments (Tests, Quizzes, reflective writing, Journals, Presentations)

What is the cause of water molecules being drawn together? A67. Cohesion S13. Gravity A26. Adhesion N38. Electromagnetism

Transcription:

Adaptation in the Human Genome A genome-wide scan for signatures of adaptive evolution using SNP data Single Nucleotide Polymorphism (SNP) A nucleotide difference at a given location in the genome Joanna Kelley GE44 Feb 7 GTAAGCCTAC GTACGCCTAC Discovering SNPs in the Human Genome BAC Library BAC Overlap Genomic DNA Random Shotgun Sequencing Shotgun Overlap GTTTAAATAATACTGATCA GTTTAAATAATACTGATCA GTTTAAATAGTACTGATCA GTTTAAATAGTACTGATCA mrna cdna Library EST Overlap Sequence Overlap - SNP discovery ~ Million SNPs Available http://www.ncbi.nlm.gov/snp/ HapMap Genetic resource for developing association maps Compare genotype patterns between individuals and populations Populations Nigerian Japanese Chinese European (Individuals from Utah - CEPH) Total number of genotyped SNPs released (Jan 7): 3,94,8 Approximately SNP per base pairs Debbie Nickerso Screen Capture of HapMap Gene: annexin A The HapMap is a Resource for Population Genetic Studies SNP data can be used to identify natural selection Genome-wide scan for regions of adaptive evolution Selective sweeps Balancing selection

Studying natural selection Population demographics Neutral Selective Sweep Advantageous Mutation Neutral Mutation decreases New Mutation genetic variation relative to neutral expectations Look for an excess of rare alleles using Tajima s D Nucleotide polymorphism, θ Nucleotide diversity, π Affects entire genome, not just one locus Nucleotide Polymorphism, θ number of polymorphic nucleotides, normalized for sample size n- θ = S / ( i= (/i)) tcagaaccatgctgccattcatgcatctgctcaaatcaatcacttctgcaatgccatcat tcagaaccatgctgcgattcatgcatcagctcaaatcaatcacttctgcaatgccatcat tcagaaccatgctgcgattcatgcatcagctcaaatcaatcacttcttcaatgccatcat tcagaaccatgctgcgattcatgcatctgctcaaatcaatcacttctgcaatgccatcat Nucleotide diversity, π proportion of nucleotides that differ between two random sequences in a sample π = ( n / (n-) ) ij x i x j p ij x i, x j - frequencies of ith and j th sequences tcagaaccatgctgccattcatgcatctgctcaaatcaatcacttctgcaatgccatcat tcagaaccatgctgcgattcatgcatcagctcaaatcaatcacttctgcaatgccatcat tcagaaccatgctgcgattcatgcatcagctcaaatcaatcacttcttcaatgccatcat tcagaaccatgctgcgattcatgcatctgctcaaatcaatcacttctgcaatgccatcat 3 Segregating (polymorphic) Sites n = 4 (number of sequences) θ =.7 Note: θ does not depend on nucleotide frequency p ij - proportion of different nucleotides between two sequences Seq Seq Seq3 Seq4 Seq 3 Seq Seq3 3 Seq4 n = 4 (number of sequences) π =.67 Note: π depends on nucleotide frequencies Tajima s D: Relation between θ and π Under neutrality, expect polymorphism and diversity to be equal (θ = π) Tajima's D is a test statistic that tests for deviations from this expectation Tajima s D >, excess of high frequency (common) alleles Heterozygote advantage, balancing selection Population bottleneck Population structure Tajima s D <, excess of low frequency (rare) alleles Directional selection Population expansion Population admixture Methods Download SNP information from SNP database (dbsnp) Create a list of SNPs that map to gene regions Download genotype data from HapMap Analyze the data Calculate #SNPs (sites) per gene, Tajima s D, etc.

CHN Distribution of Genes by Tajima s D Genes show dramatic differences in allele frequencies between populations Proportion of genes Positive Selection? Balancing Selection? AA EA EA AA 36 36 5 3 53 CHN N = 3 Tajima s D Homozygous common Heterozygous Homozygous rare Distribution of genes by Tajima s D What do you expect Tajima s D to look like in non-gene regions? Proportion Genic Non-Genic Gene regions have similar demographic history and ascertainment as non-genic regions Comparison of genic and simulated non-genic regions show excess of genes with highly negative Tajima s D Identified 385 genes showing evidence of positive selection Tajima s D Tajima s D across Chromosome Tajima s D Chromosome Position (Mb) Cluster Entire gene set Tajima s D < - Tajima s D > 4 Many of the genes in the tails are hypothetical genes Good targets for comparative sequencing Comparison of Molecular Function of genes Cell adhesion molecule Cell junction protein Chaperone Cytoskeletal protein Defense/immunity protein Extracellular matrix Hydrolase Ion channel Isomerase Kinase Ligase Lyase Membrane traffic protein Miscellaneous function Molecular function unclassified Nucleic acid binding Oxidoreductase Phosphatase Protease Receptor Select calcium binding protein Select regulatory molecule Signaling molecule Synthase and synthetase Transcription factor Transfer/carrier protein Transferase Transporter Viral protein 3

Verification Using Sequence Data Three of the genes have been verified previously by SeattleSNPs and EGP Targeted sequencing of 5 genes Genes with: Tajima s D < -. Tajima s D > 4..5 kb each, targeted to HapMap polymorphic sites ENAM - dental enamelin PKD-like - polycystic kidney disease -like POLL - Polymerase lamda RAGE - Renal tumor antigen An in-depth study: enamelin Extreme population differences Ecological relevance Potential for phenotypic correlation What is the evolutionary history of enamelin? Tooth enamel properties Tooth enamel thickness Tooth enamel thickness used in characterizing hominoid fossils Thickness varies from tooth to tooth, individual to individual and species to species Enamel thickness is heritable (Hlusko et al. 4) Documented human population specific differences in enamel thickness (Harris et al. ) Hardest, most mineralized tissue in human body Made up of small mineralized crystallites 85% mineral by volume Diet correlated to enamel thickness Carnivores / hard-object feeders = thick enamel Herbivores / soft-object feeders = thin enamel Enamel thickness examples enamelin and tooth enamel formation Herbivore Carnivore 8,75 base pairs 4 amino acids Encodes secretory protein Largest protein in enamel matrix Comprises 5% of total enamel matrix protein Function Involved in determining enamel crystallite growth and length Critical for proper enamel formation Mutations lead to thin, malformed enamel Enamel thickness 4

Evolutionary history of enamelin enamelin primate species tree Identified by polymorphism study Tajima s D Confounded by population demographics or ascertainment biases Need to confirm Direct sequencing Other methods: d N /d S Evolutionary selection between species Synonymous nucleotide substitution (d S ) amino acid remains unchanged Non-synonymous substitution (d N ) encoded amino acid changes No selection (Neutrality): d N / d S = Purifying selection: d N / d S < Positive selection: d N / d S > d N / d S - measure of selective pressure d N /d S Model Comparisons Compare likelihood of neutral vs. selection models Neutral model d N / d S Selection model Assume beta distribution Beta distribution, d N / d S estimated of d N / d S in interval (,) (one unrestricted site class) (all site classes fall within (,) Frequency Frequency Methods of Nielsen and Yang (998), Yang et al. () d N / d S d N /d S sites model analysis Predicted sites under selection Models Compared Neutral (M) vs. Selection (M) One-ratio (M) vs. Discrete (M3) Beta (M7) vs. Beta & " (M8) ** p<. -! lnl 35.4** (df = ) 63.** (df = 4) 35.6** (df = ) d N /d S estimates % sites under selection 6.76 4.3 6.76 4.3 6.8 4.3 Charged changes Cleavage products 5

Have specific lineages been subject to positive selection? Method to test lineage specific selection Correlations exist between diet and enamel thickness in primates Are specific lineages with dietary changes correlated to bursts of adaptive evolution? Reconstruct ancestral diet Identify lineages with dietary shift Hypothesis: selection on lineages with dietary shift Test for selection Neutral model branches w = Selection model branches w = free Bursts of adaptive evolution are correlated to lineages with diet change Existing evidence for molecular change tracking diet change Lysozyme and pancreatic RNAse (Messier & Stewart 997, Yang 998, Zhang 3, Zhang 6) Specific activity in foregut fermenting species (ruminants and colobine monkeys) Adaptive bursts coinciding with dietary changes enamelin primate evolution Primate d N /d S analysis indicates adaptive evolution N. European Russian Middle East N. American European Human Population Variation enamelin molecular change tracks with dietary change What is the pattern of adaptive evolution within humans? Andes Japanese Taiwanese N. Saharan S. Saharan Mbuti Tribe S. American Asian N. Saharan S. Saharan Homozygous common Homozygous rare Heterozygous Missing data 6

Distribution of SNPs in enamelin Derived allele at high frequency in non-african populations Non-synonymous derived allele and enamel thickness European American African American Allele frequency Documented population specific differences in tooth enamel thickness (Harris et al. ) Exon Non-coding SNP Synonymous SNP Non-synonymous SNP Mean enamel thickness 5.4 mm 5.56 mm Preliminary Results Association study: non-synonymous SNP and enamel thickness African and African American patients Bite-wing radiographs for enamel thickness measurements Cheek swabs for DNA sequencing Statistical analysis for association study GG N=7 Ilona Khosh, dental student, taking a cheek swab Conclusions Primate analysis indicates adaptive evolution Human polymorphism data provide evidence for human population specific adaptation Association study between genotype and enamel thickness phenotype GC N=8 CC N=4 Acknowledgements Willie Swanson Debbie Nickerson Josh Akey Bob Waterston Frank Roberts Evan Eichler Tracy Popowics Frank Roberts Ilona Khosh Diane Daubert Cindy Desmarais Kiran Dhillon Swanson Lab Sigma Xi NIH Genome Training Grant 7