Genotyping By Sequencing (GBS) Method Overview

Similar documents
Genotyping By Sequencing (GBS) Method Overview

GBS Bioinformatics Pipeline(s) Overview

GBS Bioinformatics Pipeline(s) Overview

Fei Lu. Post doctoral Associate Cornell University

Accounting for read depth in the analysis of genotyping-by-sequencing data

New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype)

SNPs versus sequences for phylogeography an explora:on using simula:ons and massively parallel sequencing in a non- model bird

Eppendorf twin.tec PCR Plates 96 LoBind Increase Yield of Transcript Species and Number of Reads of NGS Libraries

Analysis of Y-STR Profiles in Mixed DNA using Next Generation Sequencing

Quantile based Permutation Thresholds for QTL Hotspots. Brian S Yandell and Elias Chaibub Neto 17 March 2012

Comparative Genomics of Fagaceae

2. Yeast two-hybrid system

Principles of QTL Mapping. M.Imtiaz

Easy Illumina Nextera DNA FLEX Library Preparation using the epmotion 5075t automated liquid handler

Supplementary Figure 1. Phenotype of the HI strain.

Developing and implementing molecular markers in perennial ryegrass breeding

Is KIT locus polymorphism rs related to white belt phenotype in Krškopolje pig?

Automated Illumina TruSight HLA v2 Sequencing Panel Library Preparation with the epmotion 5075t

Exhaustive search. CS 466 Saurabh Sinha

Bias in RNA sequencing and what to do about it

Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites

Dr. Jennifer Weller WORKFLOW DURING THE B3 CAMP MAKING SOLUTIONS FROM STOCKS. B3 Summer Science Camp at Olympic High School

allosteric cis-acting DNA element coding strand dominant constitutive mutation coordinate regulation of genes denatured

GBS Bioinformatics Pipeline

Quiz Section 4 Molecular analysis of inheritance: An amphibian puzzle

Multivariate analysis of genetic data an introduction

Supplementary Figure 1. Schematic of split-merger microfluidic device used to add transposase to template drops for fragmentation.

Whole genome sequencing (WGS) - there s a new tool in town. Henrik Hasman DTU - Food

EST1 Homology Domain. 100 aa. hest1a / SMG6 PIN TPR TPR. Est1-like DBD? hest1b / SMG5. TPR-like TPR. a helical. hest1c / SMG7.

Design and Synthesis of the Comprehensive Fragment Library

Automated Illumina TruSeq Stranded mrna library construction with the epmotion 5075t/TMX

Comparative Network Analysis

Seminar: MPS applica/ons in the forensic DNA IDen/fica/on Solu/ons Vienna, May

Amy Driskell. Laboratories of Analytical Biology National Museum of Natural History Smithsonian Institution, Wash. DC

Illegitimate translation causes unexpected gene expression from on-target out-of-frame alleles

NCEA Level 2 Biology (91157) 2017 page 1 of 5 Assessment Schedule 2017 Biology: Demonstrate understanding of genetic variation and change (91157)

Using ADE Mutants to Study Respiration and Fermentation in Yeast Extensions to Basic Lab

Introducing New Technology for Liquid Handling Quality Assurance

Microsatellite data analysis. Tomáš Fér & Filip Kolář

Matrix and Steiner-triple-system smart pooling assays for high-performance transcription regulatory network mapping

BTRY 7210: Topics in Quantitative Genomics and Genetics

Ion Torrent. The chip is the machine

Introduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas

Introduction to Bioinformatics

Optimization of Covaris Settings for Shearing Bacterial Genomic DNA by Focused Ultrasonication and Analysis Using Agilent 2200 TapeStation

Genotype Imputation. Biostatistics 666

Variant visualisation and quality control

Creative Commons Attribution 2.0 (

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Genomics and bioinformatics summary. Finding genes -- computer searches

Introduction to analyzing NanoString ncounter data using the NanoStringNormCNV package

MRC-Holland MLPA. Description version 14; 21 January 2015

Update Report. Hauser, Lorenz

Designer Genes C Test

Frequently Asked Questions (FAQs)

Learning gene regulatory networks Statistical methods for haplotype inference Part I

Curriculum Links. AQA GCE Biology. AS level

Heterozygous BMN lines

Network Biology-part II

Single Cell Sequencing

TruSight Cancer Workflow on the MiniSeq System

Microbiome: 16S rrna Sequencing 3/30/2018

Lecture WS Evolutionary Genetics Part I 1

Whole-genome amplification in doubledigest RADseq results in adequate libraries but fewer sequenced loci

Genetics 275 Notes Week 7

High-throughput sequencing: Alignment and related topic

Cycle «Analyse de données de séquençage à haut-débit»

Eukaryotic Gene Expression

Supplementary Figure 1. Markedly decreased numbers of marginal zone B cells in DOCK8 mutant mice Supplementary Figure 2.

Coding sequence array Office hours Wednesday 3-4pm 304A Stanley Hall

DEVELOP YOUR LAB, YOUR WAY

Eiji Yamamoto 1,2, Hiroyoshi Iwata 3, Takanari Tanabata 4, Ritsuko Mizobuchi 1, Jun-ichi Yonemaru 1,ToshioYamamoto 1* and Masahiro Yano 5,6

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic

Markov Models & DNA Sequence Evolution

Elecsys with ECL technology Still light years ahead

Automated Illumina TruSeq DNA PCR-Free library construction with the epmotion 5075t/TMX

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

In-Solution Digestion: Multi-Plate v1.0 Quick Start Guide

Fine Mapping and Candidate Gene Characterization of the Pepper Bacterial Spot Resistance Gene bs6

Bioinformatics Chapter 1. Introduction

Levels of genetic variation for a single gene, multiple genes or an entire genome

*: Division of Biological Sciences; University of Missouri; Columbia, MO, 65211

Potato Genome Analysis

Nature Genetics: doi: /ng Supplementary Figure 1. The phenotypes of PI , BR121, and Harosoy under short-day conditions.

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Genome sequence of Plasmopara viticola and insight into the pathogenic mechanism

Biology 644: Bioinformatics

Encoding and Decoding 3D Genome Organization

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014

Introduction and Objectives:

Biology I Level - 2nd Semester Final Review

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

Transposable Elements Contribute to Activation of Maize Genes in Response to Abiotic Stress

networks in molecular biology Wolfgang Huber

Annotation of Plant Genomes using RNA-seq. Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA)

Predicting Protein Functions and Domain Interactions from Protein Interactions

TASK 6.3 Modelling and data analysis support

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Maize Genetics Cooperation Newsletter Vol Derkach 1

Transcription:

enotyping By Sequencing (BS) Method Overview RJ Elshire, JC laubitz, Q Sun, JV Harriman ES Buckler, and SE Mitchell http://wwwmaizegeneticsnet/

Topics Presented Background/oals BS lab protocol Illumina sequencing review BS adapter system How BS differs from RAD Modifying BS for different species Workflow in our lab

Background enotyping by sequencing (BS) in any large genome species requires reduction of genome complexity I Target enrichment Long range PCR of specific genes or genomic subsets Molecular inversion probes II Restriction Enzymes (REs) *Technically less challenging* Methylation sensitive REs filter out repetitive genomic fraction Sequence capture approaches hybridization based (microarrays)

QTL are often located in non coding regions Vgt1, Tb, B regulatory regions 60 150kb from gene QTL Exon Exon Exon Exon Map large numbers of genome wide markers Exon capture

Our goal is to create a public genotyping/informatics platform based on next generation sequencing Create inexpensive, robust multiplex sequencing protocol Low coverage Sequencing Illumina A Informatics Pipelin Combine genotypic & phenotypic data for S and WAS Impute missing data Anchor markers across the genome

Open Source Method available for anyone to use / modify Analysis pipeline details and code are public Promote dataset compatibility Method published in PLoS ONE to promote accessibility enotype calls publically available

Overview of enotyping by Sequencing (BS) Restriction site ( ) sequence tag Sample1 Loss of cut site < 450 bp Sample2 Focuses Nexten sequencing power to ends of restriction fragments Scores both SNPs and presence/absence markers

BS is a simple, highly multiplexed system for constructing libraries for next gen sequencing Reduced sample handling Few PCR & purification steps No DNA size fractionation Efficient barcoding system Simultaneous marker discovery & genotyping Scales very well

BS 96 plex Protocol (http://wwwmaizegeneticsnet/gbs overview) Now at 384 plex 1 Plate DNA & adapter pair 2 Digest DNA with RE 3 Ligate adapters

BS Adapters and Enzymes Barcode Adapter P1 Sticky Ends Common Adapter P2 Illumina Sequencing Primer 2 Barcode (4 8 bp) Restriction Enzymes Illumina Sequencing Primer 1 ApeKI CWC PstI CTCA EcoT22I ATCA T 5 3

BS 96 plex Protocol (http://wwwmaizegeneticsnet/gbs overview) 1 Plate DNA & adapter pair 2 Digest DNA with RE 3 Ligate adapters 4 Pool 96 samples Clean up Primers 5 PCR

PRC primers: Pooled Digestion/Ligation Reactions (n=94) PCR BS Library Insert Insert P1 P2 O1 P1 P2 O2

BS 96 plex Protocol (http://wwwmaizegeneticsnet/gbs overview) Clean up Clean up 1 Plate DNA & adapter pair 2 Digest DNA with RE 3 Ligate adapters 4 Pool 96 DNA aliquots Primers 5 PCR 6 Evaluate fragment sizes

Perform Titration to Minimize Adapter Dimers Before Sequencing NOTE: Done once with a small number of samples Adapter dimers constitute only 005% of raw sequence reads Size Standards Optimal adapter amount Fluorescense intensity 15 bp Adapter Dimer Library 1500 bp Time Time

Small Fragments are Enriched in BS Libraries 40% 35% 30% 25% REFENOME B73 Refen v1 IBM (B73 X Mo17) RILs 20% 15% 10% 5% 0% Total Fragments 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 10000000 >10000 ApeKI fragment size (bp)

384 plex BS Results for Maize Reads 1000000 Mean read count per line = 528,000 cv = 022 900000 800000 700000 600000 500000 400000 300000 200000 100000 0

Illumina Sequencing by Synthesis Review Flowcell 8 channels Solid Phase Oligos

Cluster Formation Amplifies Sequencing Signal Denatured Library Bridge Amplification Flow cell with bound oligos Linearization

A C T C A C T C T T P1 primer C A C T C T A A C T C T C T T TC TCA Sequencing by Synthesis Flowcell HiSeq 2000

A C T C T A C T C T C A C T C T T N A C T C T A C T C T A C T C T In Phase Out of Phase

BS captures barcode and insert DNA sequence in single read Sequencing Primer 1 Read 1 Sequencing Primer 1 Read 1 Barcode RE cut site Insert DNA Insert DNA Sequencing Primer 2 Barcode Read 2 Illumina BS

Variable Length BS Barcodes Solves Sequence PhasingIssues First 12 nt used to calculate phasing Algorithm assumes random nt distribution Incorrect phasing causes incorrect base calls ood design and modulating the RE cut site position with variable length barcodes produces even nt distribution Read 1 Insert DNA Barcode Read 2 Barcode RE cut site Insert DNA Illumina BS

Invariant BS barcodes cause loss of signal intensity

Successful BS sequencing run

Most significant BS technical issue? DNA quality

BS does not use standard Y adapters Standard BS Ligation PCR Ligation 100% Different Ends 50% Same Ends 50%

Same ended Fragments Do Not Form Clusters Denatured library Bridge Amplification Cleaved from surface Cluster Formation & Linearization No P1 binding site Flow Cell with Bound Oligos

BS vs RAD Restriction site ( ) sequence tag Sample1 Loss of cut site < 450 bp Sample2 Focuses Nexten sequencing power to ends of restriction fragments Scores both SNPs and presence/absence markers

Digest Ligate adapters Pool Random shear Size select Ligate Y adapters PCR RAD BS Reference Davey et al 2011

Modifying BS Considerations for using BS with new species and / or different enzymes

Why Modify the BS Protocol? More markers Fewer markers (deeper sequence coverage per locus) Increase multiplexing More genome appropriate (avoid more repetitive DNA classes) Other novel applications (ie, bisulfite sequencing)

enome Sampling Strategies Vary by Species Dependent on Factors that Affect Diversity: Mating System (Outcrosser, inbreeder, clonal?) Ploidy (Haploid, diploid, auto or allopolyploid?) eographical Distribution (Island population, cosmopolitan?)

Other Factors enome size The size of the genome has some bearing on the size of the fragment pool Amount of repetitive DNA directly correlated with genome size enome composition The composition of the genome can affect the frequency and distribution of the cut sites

Sampling large genomes with methylationsensitive restriction enzymes BS Library Sequenced Fragments Methylated DNA 5 base cutter Unmethylated DNA 6 base cutter

Optimizing BS in New Species rape Maize Cacao Sorghum Rice Shrub willow oose Barley Raspberry Deer Mouse Tunicate Vole iant squid Cassava Cassava Solitary Bee Yeast Scrub jay

Choosing Appropriate Restriction Enzymes Life is like a bowl of chocolates

ApeKI iant Squid PstI oose

ApeKI Deer Mouse PstI

ApeKI Shrub Willow PstI EcoT22I

BS Adapter Design

Barcode Design Considerations Barcode sets are enzyme specific Must not recreate the enzyme recognition site Must have complementary overhangs Sets must be of variable length Bases must be well balanced at each position Must different enough from each other to avoid confusion if there is a sequencing error At least 3 bp differences among barcodes Must not nest within other barcodes No mononucleotide runs of 3 or more bases http://wwwdeenabiocom/services/gbs adapters

BS SNP calls in Sorghum bicolor 091211c10hmp alleles chrom SC283 BR007B RIL_100 RIL_101 RIL_102 RIL_103 RIL_104 RIL_105 RIL_106 RIL_107 RIL_108 RIL_109 RIL_10 RIL_110 RIL_111 RIL_112 RIL_113 RIL_114 RIL_115 S10_29954 C/A 10 A N A C A N N C C N N N N N N N N C N S10_29963 A/C 10 C N C A C N N A A N N N N N N N N A N S10_29965 T/C 10 C N C T C N N T T N N N N N N N N T N S10_84178 C/ 10 N N C N N N N N N N N N N C N N N N S10_84512 T/C 10 T T N N T N T N N N N C N N N N N N N S10_84513 C/T 10 C C N N C N C N N N N T N N N N N N N S10_85556 /A 10 A A N N N R N A N R S10_85556 /A 10 N N A A N N N N A N S10_115300 T/C 10 T C T C T C T N N T T N N N C N N T T S10_133268 T/C 10 T C N N N N N T T N N N N N N N N N N S10_158101 /A 10 N N A N N N N N N N A A N N S10_158101 /A 10 A N A A N N N N N N N N N S10_163724 /A 10 A N A N N N N N N N N N N N S10_163929 T/ 10 T N T N N N N T T T N N S10_163931 C/T 10 T C N C T N T N N T T N T C C C N T N S10_180445 /A 10 A N A N N A A N S10_209049 C/ 10 N C S C C C C N C C N C N C N S10_209231 /A 10 N A A A N N N A A N S10_214941 C/T 10 C N N T N N N N C N C N N N N N N C C S10_234260 T/ 10 T N T N T T N T T T T T N N T N T S10_234260 T/ 10 T T T N T T T T T T N T T T S10_238937 /A 10 N A N A N N N N N N A N N N N S10_252729 /T 10 T T N N N N N N N T S10_252730 C/T 10 C T C T C N N C C C N N N N N T C C C S10_261151 T/C 10 T C T C T C N N T T N N T N N N T N N S10_268392 C/A 10 N N N N A N A N N N N N N N N N N N N S10_268393 /A 10 N N N N A N A N N N N N N N N N N N N S10_274414 C/T 10 N N C N N N N C T C N N C T N N C N C S10_281272 T/C 10 T N T N N N N N N N N N N N N N N T T S10_287523 /T 10 N N N N N N N N N N N N N N S10_292020 A/T 10 A T N N A N N A A A N A A N T T A A A S10_292020 A/T 10 A T A T N N N A N A N A N N T T A N A S10_294189 C/T 10 N N C N C N C C C N N N C T N N C N C S10_302638 C/T 10 C T C T C N N C N C C N C T T N C C N S10_312560 A/C 10 A C A N A N A A A A A A A C C N A A A S10_347848 T/C 10 T C T C T C N T T T T N T N C N T T T

Missing Data Strategies Impute Technical Options Reduce the multiplexing level Sequence the same library multiple times Molecular Options Choose less frequently cutting enzymes

BS Workflow DNA sample info entered to webform Project approved? Samples shipped HTS database Lab BS libraries made Analysis Pipelines DNA sequence data Reference genome Non reference genome Data File SNPs/Sample Data File States/Sample

BS Team Rob Elshire Ed Buckler Sharon Mitchell Jeff laubitz James Harriman Qi Sun