Fei Lu Post doctoral Associate Cornell University http://www.maizegenetics.net
Genotyping by sequencing (GBS) is simple and cost effective 1. Digest DNA 2. Ligate adapters with barcodes 3. Pool DNAs 4. PCR.......... Reduced representation library approach (Altshuler et al. 2000. Nature)....................................... 5. Illumina sequencing 500,000 reads/sample (384 plex) (Elshire et al. 2011. PLoSone)
Universal Network Enabled Analysis Kit (UNEAK) A reference free SNP calling pipeline Designed for species that. lack a reference genome are diploid or polyploid are inbreeders or outcrossers have limited genetic or genomic resources
Overview of UNEAK A Genome is digested, sequenced using GBS Reads are trimmed to 64 bp B Identical reads = tag
Overview of UNEAK Network filter C D E F count Pairwise alignment to find tag pairs with 1 bp mismatch Build tag networks Topology of tag networks Keep common reciprocal tags real tags error
Topology of tag networks Networks of 2496 tags Tag Error Plastid & Highly repetitive tags Moderately repetitive tags, Paralogs & SNPs
Details about network filter Error tolerance SNP
Program flowchart of UNEAK Fastq/Qseq TagCount Network filter HapMap Optional filters MapInfo TagPair (Long, Long, Integer) Seq, Seq, Order MapInfo includes: SNP Seq Count Count distribution Heterozygote code TagPair TBT(Byte/Bit)
Pipeline validated with maize inbred linkage population Evaluation criteria Step 1 Pairwise alignment of tags Step 2 Network filter Single locus rate (Blast to maize) 23.30% 87.26% Allele frequency distribution Proportion of SNPs 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 0.08 0.16 0.24 0.32 0.4 0.48 0.56 0.64 0.72 0.8 0.88 0.96 Allele frequency Proportion of SNPs 0.05 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0 0.07 0.14 0.21 0.28 0.35 0.42 0.49 0.56 0.63 0.7 0.77 0.84 0.91 0.98 Allele frequency
Characterization of the Genetic Diversity of Switchgrass Using Genotyping by Sequencing (GBS)
GWAS and GS require high density markers to accelerate breeding SNP discovery Genome Wide Association Study (GWAS) Genomic Selection (GS) Accelerate switchgrass breeding
Challenges and goals Challenges No reference genome Multiple ploidy levels (4X, 6X and 8X) Highly heterozygous Goals Discover high density SNPs Construct linkage disequilibrium (LD) map Evaluate population structure Reconstruct phylogeny
Switchgrass data set Linkage Populations Full sib Population n=130 individuals Half sib Population n=168 individuals 350 GB sequence Association Populations 66 diverse populations Mostly northern adapted, Upland populations and cultivars n= 540 individuals 720,000 SNPs generated!
Tetraploid switchgrass behaves like a diploid Proportion of SNPs 0.035 0.03 0.025 0.02 0.015 0.01 0.005 Allele frequency in full-sib population Most informative markers to construct linkage map F1 50,000 SNPs 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 Allele frequency 1:3 1:1 3:1 AA Aa AA aa Aa Aa aa Aa
18 Linkage groups perfectly match the chromosome number of switchgrass Correlation of linkage groups R Can we order the SNPs? Yes, use synteny 3,000 high coverage SNPs
Linkage groups perfect match to syntenic chromosomes of Foxtail millet (Setaria italica) Small (490 Mb) genome, diploid, n=9 13 million years divergent from switchgrass 10% switchgrass SNPs map to foxtail millet genome Constructed a linkage map of 18 groups 1,401 markers Linkage groups of switchgrass Chromosomes of foxtail millet
Upland and lowland ecotypes clearly separate in phylogeny Jackson, MI Hansens Island, MI Upland Tipton, IN Fillmore, MN Genesee, MN Ipswich prairie, WI Ipswich prairie, WI Lowland WS4U Detail
Ploidy level resolves into distinct groups Upland 4X Upland 8X Upland 8X Upland 8X Lowland 4X Lowland 4X Ploidy level identified by flow cytometry (Costich et al. 2010. Plant Genome)
Geography shows isolation by distance Upland 4X North Upland 8X East Upland 8X West Upland 8X South Lowland 4X South Lowland 4X Northeast
Upland 4X arose from Upland 8X a b 66 Upland 8X East 87 16 100 58 Upland 100 61 Upland 4X North 15 Upland 8X West Upland 8X South Lowland 4X South 96 Lowland 100 Lowland 4X Southeast NJ tree using 7,000 markers Foxtail millet (outgroup) NJ tree using 29,921 markers
Reduced diversity in Upland 4X compared with Upland 8X Upland 8X East 0.4 MDS plot Upland 4X North Coordinate 2 0.2 0.0 Upland 8X West -0.2 Upland 8X South -0.4-0.2 0.0 0.2 0.4 Coordinate 1
Migration paths of switchgrass Upland 4X North Upland 8X East Upland 8X West Upland 8X South Lowland 4X South Lowland 4X Northeast
Summary Effective SNP calling pipeline is developed It works well for non reference, heterozygous, and polyploid species 720,000 high density SNPs discovered for GWAS Tetraploid switchgrass behaves like a diploid A synteny based SNP map constructed with lowcoverage GBS markers Robust phylogeny concurs well with ecotype, ploidy level and geographic distribution of switchgrass Data suggests that Upland 4X arose from Upland 8X
Future Direction Putting it all together: GWAS and GS Caldwell Field, Cornell U, Ithaca, NY Flowering time Plant height Leaf length and width Standability Biomass quality traits Linkage populations Association populations
Acknowledgements Project Manager: Denise Costich (USDA ARS, Cornell ) PIs: Edward Buckler (USDA ARS, Cornell) Michael Casler (USDA ARS, UW Madison) Jerome Cherney (Cornell) Sequencing: Rob Elshire Jeff Glaubitz Wenyan Zhu Institute for Genomic Diversity (Cornell) Statistics: Alex Lipka Bioinformatics: Dallas Kroon Field: Ken Paddock Nick Lepak Nick Kaczmar Supported by DOE (including JGI), USDA, and NSF