Fei Lu. Post doctoral Associate Cornell University

Similar documents
GBS Bioinformatics Pipeline(s) Overview

GBS Bioinformatics Pipeline(s) Overview

Genotyping By Sequencing (GBS) Method Overview

Genotyping By Sequencing (GBS) Method Overview

GBS Bioinformatics Pipeline

Accounting for read depth in the analysis of genotyping-by-sequencing data

Genetic diversity and population structure in rice. S. Kresovich 1,2 and T. Tai 3,5. Plant Breeding Dept, Cornell University, Ithaca, NY

Fine Mapping and Candidate Gene Characterization of the Pepper Bacterial Spot Resistance Gene bs6

Processes of Evolution

SNPs versus sequences for phylogeography an explora:on using simula:ons and massively parallel sequencing in a non- model bird

New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype)

opulation genetics undamentals for SNP datasets

Molecular characterization of CIMMYT maize inbred lines with genotyping by sequencing SNPs

Polyploidy and Invasion of English Ivy in North American Forests. Presented by: Justin Ramsey & Tara Ramsey

Genotype Imputation. Class Discussion for January 19, 2016

Analysis of Y-STR Profiles in Mixed DNA using Next Generation Sequencing

Microsatellite data analysis. Tomáš Fér & Filip Kolář

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Lecture WS Evolutionary Genetics Part I 1

MOLECULAR MAPS AND MARKERS FOR DIPLOID ROSES

USDA-DOE Plant Feedstock Genomics for Bioenergy

Computational Approaches to Statistical Genetics

Meiosis and Mendel. Chapter 6

Ben Hecht Columbia River Inter-Tribal Fish Commission March 19, 2013

Genotype Imputation. Biostatistics 666

Humans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

Supplementary Figure 1. Phenotype of the HI strain.

Genetic diversity of beech in Greece

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Developing and implementing molecular markers in perennial ryegrass breeding

March 14, Roll and Bell Work: Explain the difference between point mutation and frame shift mutation.

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

BTRY 7210: Topics in Quantitative Genomics and Genetics

SoyBase, the USDA-ARS Soybean Genetics and Genomics Database

Introduction to Linkage Disequilibrium

Life Cycles, Meiosis and Genetic Variability24/02/2015 2:26 PM

Q Expected Coverage Achievement Merit Excellence. Punnett square completed with correct gametes and F2.

Levels of genetic variation for a single gene, multiple genes or an entire genome

Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies

Big Idea #1: The process of evolution drives the diversity and unity of life

Evolution of phenotypic traits

NOTES CH 17 Evolution of. Populations

Bias in RNA sequencing and what to do about it

Wheat Genetics and Molecular Genetics: Past and Future. Graham Moore

Inbreeding depression due to stabilizing selection on a quantitative character. Emmanuelle Porcher & Russell Lande

Microsatellite evolution in Adélie penguins

LINKAGE MAPPING IN PRAIRIE CORDGRASS (SPARTINA PECTINATA LINK) USING GENOTYPING-BY-SEQUENCING MARKERS JOSEPH D. CRAWFORD THESIS.

NCEA Level 2 Biology (91157) 2017 page 1 of 5 Assessment Schedule 2017 Biology: Demonstrate understanding of genetic variation and change (91157)

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?

Conservation Genetics. Outline

Heterozygous BMN lines

Learning ancestral genetic processes using nonparametric Bayesian models

BS 50 Genetics and Genomics Week of Oct 3 Additional Practice Problems for Section. A/a ; B/B ; d/d X A/a ; b/b ; D/d

Maize Genetics Cooperation Newsletter Vol Derkach 1

When one gene is wild type and the other mutant:

Family Trees for all grades. Learning Objectives. Materials, Resources, and Preparation

Detection of triploids in the INRA collection

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

Population Genetics Of Setaria Viridis, A New Model System

Family Trees for all grades. Learning Objectives. Materials, Resources, and Preparation

Chapter 11 Chromosome Mutations. Changes in chromosome number Chromosomal rearrangements Evolution of genomes

Comparative Genomics of Fagaceae

AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity,

Genome-wide analysis of zygotic linkage disequilibrium and its components in crossbred cattle

Orthologs Detection and Applications

Assessment of Genetic Diversity of Pawpaw Cultivars with Inter-Simple Sequence Repeat Markers

Name Class Date. KEY CONCEPT Gametes have half the number of chromosomes that body cells have.

Objectives. Announcements. Comparison of mitosis and meiosis

Edward M. Golenberg Wayne State University Detroit, MI

Evaluating allopolyploid origins in strawberries (Fragaria) using haplotypes generated from target capture sequencing

Linkage and Linkage Disequilibrium

Population Structure

Potato Genome Analysis

Whole Genome Alignments and Synteny Maps

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

Outline for today s lecture (Ch. 14, Part I)

Exam 1 PBG430/

Using Genetics, Genomics, and Breeding to Understand Diverse Maize Germplasm

Robust demographic inference from genomic and SNP data

(Genome-wide) association analysis

GENETICS - CLUTCH CH.1 INTRODUCTION TO GENETICS.

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

Is KIT locus polymorphism rs related to white belt phenotype in Krškopolje pig?

*: Division of Biological Sciences; University of Missouri; Columbia, MO, 65211

Quantitative characters II: heritability

Multivariate analysis of genetic data an introduction

Report of the Research Coordination Meeting Genetics of Root-Knot Nematode Resistance in Cotton Dallas, Texas, October 24, 2007

The Origin of Species

Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study

The applicability of next-generation sequencing to native plant materials development

Population Genetics & Evolution

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Genetic proof of chromatin diminution under mitotic agamospermy

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Invasion genomics and adaptation in Australian fireweed. Andrew Lowe Peter Prentis, Elly Dormontt University of Adelaide, Australia

Pyrobayes: an improved base caller for SNP discovery in pyrosequences

Unit 2 Lesson 4 - Heredity. 7 th Grade Cells and Heredity (Mod A) Unit 2 Lesson 4 - Heredity

Eiji Yamamoto 1,2, Hiroyoshi Iwata 3, Takanari Tanabata 4, Ritsuko Mizobuchi 1, Jun-ichi Yonemaru 1,ToshioYamamoto 1* and Masahiro Yano 5,6

Transcription:

Fei Lu Post doctoral Associate Cornell University http://www.maizegenetics.net

Genotyping by sequencing (GBS) is simple and cost effective 1. Digest DNA 2. Ligate adapters with barcodes 3. Pool DNAs 4. PCR.......... Reduced representation library approach (Altshuler et al. 2000. Nature)....................................... 5. Illumina sequencing 500,000 reads/sample (384 plex) (Elshire et al. 2011. PLoSone)

Universal Network Enabled Analysis Kit (UNEAK) A reference free SNP calling pipeline Designed for species that. lack a reference genome are diploid or polyploid are inbreeders or outcrossers have limited genetic or genomic resources

Overview of UNEAK A Genome is digested, sequenced using GBS Reads are trimmed to 64 bp B Identical reads = tag

Overview of UNEAK Network filter C D E F count Pairwise alignment to find tag pairs with 1 bp mismatch Build tag networks Topology of tag networks Keep common reciprocal tags real tags error

Topology of tag networks Networks of 2496 tags Tag Error Plastid & Highly repetitive tags Moderately repetitive tags, Paralogs & SNPs

Details about network filter Error tolerance SNP

Program flowchart of UNEAK Fastq/Qseq TagCount Network filter HapMap Optional filters MapInfo TagPair (Long, Long, Integer) Seq, Seq, Order MapInfo includes: SNP Seq Count Count distribution Heterozygote code TagPair TBT(Byte/Bit)

Pipeline validated with maize inbred linkage population Evaluation criteria Step 1 Pairwise alignment of tags Step 2 Network filter Single locus rate (Blast to maize) 23.30% 87.26% Allele frequency distribution Proportion of SNPs 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 0.08 0.16 0.24 0.32 0.4 0.48 0.56 0.64 0.72 0.8 0.88 0.96 Allele frequency Proportion of SNPs 0.05 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0 0.07 0.14 0.21 0.28 0.35 0.42 0.49 0.56 0.63 0.7 0.77 0.84 0.91 0.98 Allele frequency

Characterization of the Genetic Diversity of Switchgrass Using Genotyping by Sequencing (GBS)

GWAS and GS require high density markers to accelerate breeding SNP discovery Genome Wide Association Study (GWAS) Genomic Selection (GS) Accelerate switchgrass breeding

Challenges and goals Challenges No reference genome Multiple ploidy levels (4X, 6X and 8X) Highly heterozygous Goals Discover high density SNPs Construct linkage disequilibrium (LD) map Evaluate population structure Reconstruct phylogeny

Switchgrass data set Linkage Populations Full sib Population n=130 individuals Half sib Population n=168 individuals 350 GB sequence Association Populations 66 diverse populations Mostly northern adapted, Upland populations and cultivars n= 540 individuals 720,000 SNPs generated!

Tetraploid switchgrass behaves like a diploid Proportion of SNPs 0.035 0.03 0.025 0.02 0.015 0.01 0.005 Allele frequency in full-sib population Most informative markers to construct linkage map F1 50,000 SNPs 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 Allele frequency 1:3 1:1 3:1 AA Aa AA aa Aa Aa aa Aa

18 Linkage groups perfectly match the chromosome number of switchgrass Correlation of linkage groups R Can we order the SNPs? Yes, use synteny 3,000 high coverage SNPs

Linkage groups perfect match to syntenic chromosomes of Foxtail millet (Setaria italica) Small (490 Mb) genome, diploid, n=9 13 million years divergent from switchgrass 10% switchgrass SNPs map to foxtail millet genome Constructed a linkage map of 18 groups 1,401 markers Linkage groups of switchgrass Chromosomes of foxtail millet

Upland and lowland ecotypes clearly separate in phylogeny Jackson, MI Hansens Island, MI Upland Tipton, IN Fillmore, MN Genesee, MN Ipswich prairie, WI Ipswich prairie, WI Lowland WS4U Detail

Ploidy level resolves into distinct groups Upland 4X Upland 8X Upland 8X Upland 8X Lowland 4X Lowland 4X Ploidy level identified by flow cytometry (Costich et al. 2010. Plant Genome)

Geography shows isolation by distance Upland 4X North Upland 8X East Upland 8X West Upland 8X South Lowland 4X South Lowland 4X Northeast

Upland 4X arose from Upland 8X a b 66 Upland 8X East 87 16 100 58 Upland 100 61 Upland 4X North 15 Upland 8X West Upland 8X South Lowland 4X South 96 Lowland 100 Lowland 4X Southeast NJ tree using 7,000 markers Foxtail millet (outgroup) NJ tree using 29,921 markers

Reduced diversity in Upland 4X compared with Upland 8X Upland 8X East 0.4 MDS plot Upland 4X North Coordinate 2 0.2 0.0 Upland 8X West -0.2 Upland 8X South -0.4-0.2 0.0 0.2 0.4 Coordinate 1

Migration paths of switchgrass Upland 4X North Upland 8X East Upland 8X West Upland 8X South Lowland 4X South Lowland 4X Northeast

Summary Effective SNP calling pipeline is developed It works well for non reference, heterozygous, and polyploid species 720,000 high density SNPs discovered for GWAS Tetraploid switchgrass behaves like a diploid A synteny based SNP map constructed with lowcoverage GBS markers Robust phylogeny concurs well with ecotype, ploidy level and geographic distribution of switchgrass Data suggests that Upland 4X arose from Upland 8X

Future Direction Putting it all together: GWAS and GS Caldwell Field, Cornell U, Ithaca, NY Flowering time Plant height Leaf length and width Standability Biomass quality traits Linkage populations Association populations

Acknowledgements Project Manager: Denise Costich (USDA ARS, Cornell ) PIs: Edward Buckler (USDA ARS, Cornell) Michael Casler (USDA ARS, UW Madison) Jerome Cherney (Cornell) Sequencing: Rob Elshire Jeff Glaubitz Wenyan Zhu Institute for Genomic Diversity (Cornell) Statistics: Alex Lipka Bioinformatics: Dallas Kroon Field: Ken Paddock Nick Lepak Nick Kaczmar Supported by DOE (including JGI), USDA, and NSF