GBS Bioinformatics Pipeline(s) Overview
|
|
- Lynette Shanna Knight
- 6 years ago
- Views:
Transcription
1 GBS Bioinformatics Pipeline(s) Overview Getting from sequence files to genotypes. Pipeline Coding: Ed Buckler Jeff Glaubitz James Harriman Presentation: Terry Casstevens With supporting information from the coders.
2 Philosophy of the GBS Pipeline Why develop our own pipeline? Efficiency Only align unique fragments once. Data structures specific to these data types.
3 Three Pipelines Discovery Pipeline Requires a reference genome Multiple steps to get to genotypes Hands on tutorial is based on this pipeline Production Pipeline Uses information from Discovery Pipeline One step from sequence to genotypes UNEAK Pipeline For species without a reference genome Fei Lu will present this tomorrow at 9:30
4 Vocabulary Sequence File Text file containing DNA sequence and supplemental information from the Illumina Platform. Taxa An individual sample Key File Text file used to assign a GBS Bar Code to a Taxa GBS Tag DNA sequence consisting of a cut site remnant and additional sequence. GBS Bar Code A short known sequence of DNA used to assign a GBS Tag to its original Taxa
5 GBS Discovery Pipeline Discovery Sequence Tags by Taxa Tag Counts TOPM SNP Caller
6 GBS Discovery Pipeline Discovery Sequence Tags by Taxa Tag Counts TOPM SNP Caller
7 Raw Sequence (Qseq) HWI-ST GTCGATTCTGCTGACTTCATGGCTTCTGTTGACG HWI-ST GAGAATCAGCTTTTCCAACACCTTGAGTTTGAGT HWI-ST ATGTACTGCACCGTTGCAAGCGAGCACCACCAA HWI-ST CCAGCTCAGCCTGCATTCTTTCAAAAACTTCCAA HWI-ST GATTTTACTGCACATCGGTCTTGTCACACCAGCT HWI-ST TCACCCAGCATCACGCCCCTTCACATCCAGTAAA HWI-ST CTTGACTGCCACCATGAATATGTGTTCCAAGTGC HWI-ST CCACAACTGCTCCATCTTTTCCATGAGACATTGC HWI-ST GTATTCTGCACACGAATCAGCTGAGACACCAATT HWI-ST AATATGCCAGCAGTTAAGAGAGTTCAAGATCCAG HWI-ST CTCCCTGCGGGTGCGCGCGACCCATCTTCAGTT HWI-ST TGGTACGTCTGCGGAATGGCGTTTTTTATGCCTT HWI-ST GGACCTACTGCCCAAGAACGGCTCACCCATCAT HWI-ST GAGAATCAGCGTGTACGGGGCACGGGGTGACT HWI-ST TTCTCCAGCCGCATGGGCCGGAGACCAGAGAG HWI-ST GCGTCAGCAAATGCCCCAACAGCCAAGTCAGCA HWI-ST TAGGCCATCAGCTGACTTCCCGGGTGTGGAGAA HWI-ST GGACCTACTGCCGGCGGGACGAAAGCGGTTGT HWI-ST CTCCCTGTTGAAGCATGTGCAAAAGAGCTTGTTC HWI-ST CGCCTTATCTGCCCTCGCCGGTCATGGGGAGTG
8 Raw Sequence (Qseq) HWI-ST GTCGATTCTGCTGACTTCATGGCTTCTGTTGACG HWI-ST GAGAATCAGCTTTTCCAACACCTTGAGTTTGAGT HWI-ST ATGTACTGCACCGTTGCAAGCGAGCACCACCAA HWI-ST CCAGCTCAGCCTGCATTCTTTCAAAAACTTCCAA HWI-ST GATTTTACTGCACATCGGTCTTGTCACACCAGCT HWI-ST TCACCCAGCATCACGCCCCTTCACATCCAGTAAA HWI-ST CTTGACTGCCACCATGAATATGTGTTCCAAGTGC HWI-ST CCACAACTGCTCCATCTTTTCCATGAGACATTGC HWI-ST GTATTCTGCACACGAATCAGCTGAGACACCAATT HWI-ST AATATGCCAGCAGTTAAGAGAGTTCAAGATCCAG HWI-ST CTCCCTGCGGGTGCGCGCGACCCATCTTCAGTT HWI-ST TGGTACGTCTGCGGAATGGCGTTTTTTATGCCTT HWI-ST GGACCTACTGCCCAAGAACGGCTCACCCATCAT HWI-ST GAGAATCAGCGTGTACGGGGCACGGGGTGACT HWI-ST TTCTCCAGCCGCATGGGCCGGAGACCAGAGAG HWI-ST GCGTCAGCAAATGCCCCAACAGCCAAGTCAGCA HWI-ST TAGGCCATCAGCTGACTTCCCGGGTGTGGAGAA HWI-ST GGACCTACTGCCGGCGGGACGAAAGCGGTTGT HWI-ST CTCCCTGTTGAAGCATGTGCAAAAGAGCTTGTT HWI-ST CGCCTTATCTGCCCTCGCCGGTCATGGGGAGTG
9 Key File Flowcell Lane Barcode DNASample LibraryPlate Row Column LibraryPrepID PlateName 81PVTABXX 2 CTCC Sample_1 1 A 1 1 Plate_A 81PVTABXX 2 TGCA Sample_2 1 A 2 2 Plate_A 81PVTABXX 2 ACTA Sample_3 1 A 3 3 Plate_A 81PVTABXX 2 CAGA Sample_4 1 A 4 4 Plate_A 81PVTABXX 2 AACT Sample_5 1 A 5 5 Plate_A 81PVTABXX 2 GCGT Sample_6 1 A 6 6 Plate_A 81PVTABXX 2 TGCGA Sample_7 1 A 7 7 Plate_A 81PVTABXX 2 CGAT Sample_8 1 A 8 8 Plate_A 81PVTABXX 2 CGCTT Sample_9 1 A 9 9 Plate_A 81PVTABXX 2 TCACC Sample_10 1 A Plate_A 81PVTABXX 2 CTAGC Sample_11 1 A Plate_A 81PVTABXX 2 ACAAA Sample_12 1 A Plate_A 81PVTABXX 2 TTCTC Sample_13 1 B 1 13 Plate_A 81PVTABXX 2 AGCCC Sample_14 1 B 2 14 Plate_A 81PVTABXX 2 GTATT Sample_15 1 B 3 15 Plate_A 81PVTABXX 2 CTGTA Sample_16 1 B 4 16 Plate_A 81PVTABXX 2 ACCGT Sample_17 1 B 5 17 Plate_A 81PVTABXX 2 GTAA Sample_18 1 B 6 18 Plate_A 81PVTABXX 2 GGTTGT Sample_19 1 B 7 19 Plate_A
10 Fragment from GBS library: GBS Tags Barcode adapter Cut site Insert Cut site Common adapter Good reads: (only the first 64 bases after the barcode are kept) typical read: Barcode Cut site Insert (first 64 bases) short fragment: Barcode Cut site Insert (<64bp) Cut site Common adapter chimera or partial digestion: Barcode Cut site Insert (<64bp) Cut site 2 nd Insert
11 Fragment from GBS library: GBS Tags Barcode adapter Cut site Insert Cut site Common adapter Good reads: (only the first 64 bases after the barcode are kept) typical read: Barcode Cut site Insert (first 64 bases) short fragment: Barcode Cut site Insert (<64bp) Cut site chimera or partial digestion: Barcode Cut site Insert (<64bp) Cut site
12 Fragment from GBS library: GBS Tags Barcode adapter Cut site Insert Cut site Common adapter Good reads: (only the first 64 bases after the barcode are kept) typical read: Barcode Cut site Insert (first 64 bases) short fragment: Barcode Cut site Insert (<64bp) Cut site chimera or partial digestion: Barcode Cut site Insert (<64bp) Cut site Rejected reads: Barcode Cut site Common adapter Not matching barcode and cut site remnant Contains N in first 64 bases after the barcode adapter dimer
13 GBS Discovery Pipeline Discovery Sequence Tags by Taxa Tag Counts TOPM SNP Caller
14 Tag Counts With information from the key file, each sequence file is processed, tags are identified and counted. If a tag is shorter than 64 bases it is padded. The tags and counts are put into a tag count file for each sequence file. QseqToTagCountsPlugin / FastqToTagCountsPlugin
15 Master Tag Counts The individual tag count files are merged into a master tag count file. A minimum count is specified at the merge stage to exclude tags with low counts (likely sequencing errors). MergeMultipleTagCountsPlugin
16 Conversion of Tags to Fastq Sequence aligners do not work with the tag count file format. In preparation for the alignment step, the tag count file is converted to fastq format. TagCountsToFastqPlugin
17 GBS Discovery Pipeline Discovery Sequence Tags by Taxa Tag Counts TOPM SNP Caller
18 Tag Alignment / TOPM The GBS pipeline uses an external aligner to do the initial alignment. The current version uses bowtie2 which produces the alignment in the SAM format. bowtie2 We convert the SAM file into our tags on physical map format (TOPM) SAMConverterPlugin
19 TOPM
20 So Far We Have Identified and counted GBS tags. Converted tag counts file to fastq. Aligned the tags to a reference. Converted the alignment to TOPM.
21 GBS Discovery Pipeline Discovery Sequence Tags by Taxa Tag Counts TOPM SNP Caller
22 Tags by Taxa In this step we identify which tags are present in which taxa. Original Sequence Files Key File Master Tag Count File Recently migrated to HDF5 file format. Efficient storage Large data sets SeqToTBTHDF5Plugin
23 Tags By Taxa Additional Operations If many TBTs have been created they are merged into 1 TBT. Taxa that were sequenced multiple times are merged. The TBT table is pivoted in preparation for SNP calling. ModifyTBTHDF5Plugin
24 GBS Discovery Pipeline Discovery Sequence Tags by Taxa Tag Counts TOPM SNP Caller
25 SNP Calling Files used in SNP Calling TOPM TBT Some Key Settings mnf MinimumF (inbreeding coefficient) mnmaf Minimum Minor Allele Frequency mnmac Minimum Minor Allele Count mnlcov Minimum Locus Coverage TagsToSNPByAlignmentPlugin
26 HapMap rs# alleles chrom pos strand SgSBRIL067:633Y5AAXX:2:C9 SgSBRIL019:633Y5AAXX:2:C3 S1_2100 A/G N N N N N N N R N A N S1_2163 T/C N N N N N N T C T T N S1_13837 T/G N N N N N N N G N N T S1_14606 C/T N N C N N N T T T T C S1_2061 T/A T N N N N N N A N N N S1_68332 C/T N N N N N N N N N N N S1_68596 A/T A N N N N N N N N A N S1_69309 G/A N G N N N N N A N N N S1_79955 T/G N T G T T N T T N N N S1_79961 T/G N T T T T N T T N N N S1_80584 G N N N N N N N N N N G S1_80647 C/T N N N N N N N C N N C S1_81274 T/G N N N N N N T G N N N S1_ G/A N N N N N N N N N N N S1_ T/G N N N N N N K T N N N S1_ C/T N N N N N N T C N T S1_ T/C N N N N N N N C N N N S1_ G/A G G A N N G G G G N S1_ T/G N N T N N N T T N N T S1_ A/G N A G N N N G A N N N S1_ C/T N N N N C N N C N N N S1_ T/C N T N N N N
27 Discovery Fastq GBS Discovery pipeline Tags by Taxa Tag Counts TOPM SNP Caller
28 Discovery Fastq GBS Discovery pipeline Tags by Taxa Tag Counts TOPM SNP Caller Filtered
29 Production Pipeline
30 Why another pipeline? The last maize build (30000 taxa) with the discovery pipeline took over 3 months. Most common alleles have been identified after the first few discovery builds. Use the information from the discovery pipeline to call SNPs in new runs quickly. Improve efficiency and automate.
31 GBS Bioinformatics Pipelines Discovery Production Fastq Fastq Tags by Taxa Tag Counts TOPM SNP Caller
32 Discovery Fastq Production Fastq Tags by Taxa Tag Counts TOPM TagsOnPhysicalMap (TOPM) SNP Caller
33 GBS Bioinformatics Pipelines Discovery Production Fastq Fastq Tags by Taxa Tag Counts TOPM SNP Caller Filtered
34 GBS Bioinformatics Pipelines Discovery Production Fastq Fastq Tags by Taxa Tag Counts TOPM TOPM SNP Caller Filtered
35 GBS Bioinformatics Pipelines Discovery Production Fastq Fastq Tags by Taxa Tag Counts TOPM TOPM SNP Caller Filtered
36 GBS Bioinformatics Pipelines Discovery Production Fastq Fastq Tags by Taxa Tag Counts TOPM TOPM SNP Caller Filtered
37 Running the Production Pipeline Required Files: Sequence file (fastq or qseq) Key file Production TOPM TASSEL 3 Standalone & RawReadsToHapMapPlugin Running the Pipeline: One lane processed at a time HapMap files by chromosome ~20 minutes
38 Testing Production Pipeline Compared HapMap files produced by Discovery Pipeline and Production Pipeline Site Comparison: Discovery 48,139 Production 47,676 Difference due to maximum 8 alleles 99.98% correlation of genetic distance matrices
39 Next Steps In Pipeline Development Hierarchical Data Format supports very large data sets and complex data structures. Working to fuse TOPM, TBT, Keyfile, and Pedigree File into one HDF5 repository. Continued improvements to SNP caller. Ability to use tags not present in the reference.
GBS Bioinformatics Pipeline(s) Overview
GBS Bioinformatics Pipeline(s) Overview Getting from sequence files to genotypes. Pipeline Coding: Ed Buckler Jeff Glaubitz James Harriman Presentation: Rob Elshire With supporting information from the
More informationGBS Bioinformatics Pipeline
GBS Bioinformatics Pipeline...or, Where Your Data Go After Sequencing James Harriman Ed Buckler Jeff Glaubitz Reference Genome Pipeline QseqToTagCount Qseq Key files QseqToTBT TagCounts per lane TagsByTaxa
More informationFei Lu. Post doctoral Associate Cornell University
Fei Lu Post doctoral Associate Cornell University http://www.maizegenetics.net Genotyping by sequencing (GBS) is simple and cost effective 1. Digest DNA 2. Ligate adapters with barcodes 3. Pool DNAs 4.
More informationGenotyping By Sequencing (GBS) Method Overview
enotyping By Sequencing (BS) Method Overview Sharon E Mitchell Institute for enomic Diversity Cornell University http://wwwmaizegeneticsnet/ Topics Presented Background/oals BS lab protocol Illumina sequencing
More informationGenotyping By Sequencing (GBS) Method Overview
enotyping By Sequencing (BS) Method Overview RJ Elshire, JC laubitz, Q Sun, JV Harriman ES Buckler, and SE Mitchell http://wwwmaizegeneticsnet/ Topics Presented Background/oals BS lab protocol Illumina
More informationAnalysis of Y-STR Profiles in Mixed DNA using Next Generation Sequencing
Analysis of Y-STR Profiles in Mixed DNA using Next Generation Sequencing So Yeun Kwon, Hwan Young Lee, and Kyoung-Jin Shin Department of Forensic Medicine, Yonsei University College of Medicine, Seoul,
More informationVariant visualisation and quality control
Variant visualisation and quality control You really should be making plots! 25/06/14 Paul Theodor Pyl 1 Classical Sequencing Example DNA.BAM.VCF Aligner Variant Caller A single sample sequencing run 25/06/14
More informationDepartment of Forensic Psychiatry, School of Medicine & Forensics, Xi'an Jiaotong University, Xi'an, China;
Title: Evaluation of genetic susceptibility of common variants in CACNA1D with schizophrenia in Han Chinese Author names and affiliations: Fanglin Guan a,e, Lu Li b, Chuchu Qiao b, Gang Chen b, Tinglin
More informationHigh-throughput sequencing: Alignment and related topic
High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg HTS Platforms E s ta b lis h e d p la tfo rm s Illu m in a H is e q, A B I S O L id, R o c h e 4 5 4 N e w c o m e rs
More informationNew imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype)
New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype) Kelly Swarts PAG Allele Mining 1/11/2014 Imputation is the projection
More informationAccounting for read depth in the analysis of genotyping-by-sequencing data
Accounting for read depth in the analysis of genotyping-by-sequencing data Ken Dodds, John McEwan, Timothy Bilton, Rudi Brauning, Rayna Anderson, Tracey Van Stijn, Theodor Kristjánsson, Shannon Clarke
More information*: Division of Biological Sciences; University of Missouri; Columbia, MO, 65211
Genetics: Early Online, published on July 20, 2016 as 10.1534/genetics.116.191726 Fast-Flowering Mini-Maize: Seed to Seed in 60 Days Morgan E. McCaw*, Jason G. Wallace,1, Patrice S. Albert*, Edward S.
More informationexpress: Streaming read deconvolution and abundance estimation applied to RNA-Seq
express: Streaming read deconvolution and abundance estimation applied to RNA-Seq Adam Roberts 1 and Lior Pachter 1,2 1 Department of Computer Science, 2 Departments of Mathematics and Molecular & Cell
More informationIntroduction to PLINK H3ABionet Course Covenant University, Nigeria
UNIVERSITY OF THE WITWATERSRAND, JOHANNESBURG Introduction to PLINK H3ABionet Course Covenant University, Nigeria Scott Hazelhurst H3ABioNet funded by NHGRI grant number U41HG006941 Wits Bioinformatics
More informationHapsembler version 2.1 ( + Encore & Scarpa) Manual. Nilgun Donmez Department of Computer Science University of Toronto
Hapsembler version 2.1 ( + Encore & Scarpa) Manual Nilgun Donmez Department of Computer Science University of Toronto January 13, 2013 Contents 1 Introduction.................................. 2 2 Installation..................................
More informationSNPs versus sequences for phylogeography an explora:on using simula:ons and massively parallel sequencing in a non- model bird
SNPs versus sequences for phylogeography an explora:on using simula:ons and massively parallel sequencing in a non- model bird Michael G. Harvey, Brian T. Smith, Brant C. Faircloth, Travis C. Glenn, and
More informationHigh-throughput sequence alignment. November 9, 2017
High-throughput sequence alignment November 9, 2017 a little history human genome project #1 (many U.S. government agencies and large institute) started October 1, 1990. Goal: 10x coverage of human genome,
More informationGenomes Comparision via de Bruijn graphs
Genomes Comparision via de Bruijn graphs Student: Ilya Minkin Advisor: Son Pham St. Petersburg Academic University June 4, 2012 1 / 19 Synteny Blocks: Algorithmic challenge Suppose that we are given two
More informationPredictive Genome Analysis Using Partial DNA Sequencing Data
Predictive Genome Analysis Using Partial DNA Sequencing Data Nauman Ahmed, Koen Bertels and Zaid Al-Ars Computer Engineering Lab, Delft University of Technology, Delft, The Netherlands {n.ahmed, k.l.m.bertels,
More informationHumans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase
Humans have two copies of each chromosome Inherited from mother and father. Genotyping technologies do not maintain the phase Genotyping technologies do not maintain the phase Recall that proximal SNPs
More informationGTRAC FAST R ETRIEVAL FROM C OMPRESSED C OLLECTIONS OF G ENOMIC VARIANTS. Kedar Tatwawadi Mikel Hernaez Idoia Ochoa Tsachy Weissman
GTRAC FAST R ETRIEVAL FROM C OMPRESSED C OLLECTIONS OF G ENOMIC VARIANTS Kedar Tatwawadi Mikel Hernaez Idoia Ochoa Tsachy Weissman Overview Introduction Results Algorithm Details Summary & Further Work
More informationDictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line
Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VF Files On-line MatBio 18 Solon P. Pissis and Ahmad Retha King s ollege London 02-Aug-2018 Solon P. Pissis and Ahmad Retha
More informationIsoform discovery and quantification from RNA-Seq data
Isoform discovery and quantification from RNA-Seq data C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Deloger November 2016 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification
More informationIntroduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas
Introduc)on to RNA- Seq Data Analysis Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas Material: hep://)ny.cc/rnaseq Slides: hep://)ny.cc/slidesrnaseq
More informationPyrobayes: an improved base caller for SNP discovery in pyrosequences
Pyrobayes: an improved base caller for SNP discovery in pyrosequences Aaron R Quinlan, Donald A Stewart, Michael P Strömberg & Gábor T Marth Supplementary figures and text: Supplementary Figure 1. The
More informationBTRY 7210: Topics in Quantitative Genomics and Genetics
BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:
More informationEffect of Genetic Divergence in Identifying Ancestral Origin using HAPAA
Effect of Genetic Divergence in Identifying Ancestral Origin using HAPAA Andreas Sundquist*, Eugene Fratkin*, Chuong B. Do, Serafim Batzoglou Department of Computer Science, Stanford University, Stanford,
More informationGenotype Imputation. Biostatistics 666
Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives
More informationIntroduction to Linkage Disequilibrium
Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have
More informationSupporting Information
Supporting Information Weghorn and Lässig 10.1073/pnas.1210887110 SI Text Null Distributions of Nucleosome Affinity and of Regulatory Site Content. Our inference of selection is based on a comparison of
More informationComparative Genomics of Fagaceae
Fagaceae Images.google.com Linkage Map www.quia.com TM www.clipartlord.com Selection of mapping parents SM2 SM1 Predominant pollinator? Progeny Exclusion for Full Sib Linkage Mapping Year Acorns genotyped
More informationSupplementary Information for Discovery and characterization of indel and point mutations
Supplementary Information for Discovery and characterization of indel and point mutations using DeNovoGear Avinash Ramu 1 Michiel J. Noordam 1 Rachel S. Schwartz 2 Arthur Wuster 3 Matthew E. Hurles 3 Reed
More informationCase-Control Association Testing. Case-Control Association Testing
Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies
More informationIs KIT locus polymorphism rs related to white belt phenotype in Krškopolje pig?
Is KIT locus polymorphism rs328592739 related to white belt phenotype in Krškopolje pig? Jernej Ogorevc, Minja Zorc, Martin Škrlep, Riccardo Bozzi, Matthias Petig, Luca Fontanesi, Marjeta Čandek-Potokar,
More informationHeterozygous BMN lines
Optical density at 80 hours 0.8 0.6 0.4 0.2 0.8 0.6 0.4 0.2 0.8 0.6 0.4 0.2 0.8 0.6 0.4 0.2 a YPD b YPD + 1µM nystatin c YPD + 2µM nystatin d YPD + 4µM nystatin 1 3 5 6 9 13 16 20 21 22 23 25 28 29 30
More informationWhole Genome Alignments and Synteny Maps
Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of
More informationCycle «Analyse de données de séquençage à haut-débit»
Cycle «Analyse de données de séquençage à haut-débit» Module 1/5 Analyse ADN Chadi Saad CRIStAL - Équipe BONSAI - Univ Lille, CNRS, INRIA (chadi.saad@univ-lille.fr) Présentation de Sophie Gallina (source:
More informationMolecular characterization of CIMMYT maize inbred lines with genotyping by sequencing SNPs
DOI 10.1007/s00122-016-2664-8 ORIGINAL ARTICLE Molecular characterization of CIMMYT maize inbred lines with genotyping by sequencing SNPs Yongsheng Wu 1,2 Felix San Vicente 2 Kaijian Huang 1 Thanda Dhliwayo
More informationExplore SNP polymorphism data. A. Dereeper, Y. Hueber
Explore SNP polymorphism data A. Dereeper, Y. Hueber Bioinformatics trainings, Supagro, February, 2016 Tablet Graphical tool to visualize assemblies Accept many formats ACE, SAM, BAM GATK (Genome Analysis
More information1. Understand the methods for analyzing population structure in genomes
MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population
More informationopulation genetics undamentals for SNP datasets
opulation genetics undamentals for SNP datasets with crocodiles) Sam Banks Charles Darwin University sam.banks@cdu.edu.au I ve got a SNP genotype dataset, now what? Do my data meet the requirements of
More informationA Browser for Pig Genome Data
A Browser for Pig Genome Data Thomas Mailund January 2, 2004 This report briefly describe the blast and alignment data available at http://www.daimi.au.dk/ mailund/pig-genome/ hits.html. The report describes
More informationSupplementary Figure 1. Phenotype of the HI strain.
Supplementary Figure 1. Phenotype of the HI strain. (A) Phenotype of the HI and wild type plant after flowering (~1month). Wild type plant is tall with well elongated inflorescence. All four HI plants
More informationCOMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics
COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics Thorsten Dickhaus University of Bremen Institute for Statistics AG DANK Herbsttagung
More informationDEGseq: an R package for identifying differentially expressed genes from RNA-seq data
DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics
More informationRead Quality Assessment & Improvement. J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014
Read Quality ssessment & Improvement J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014 Error modes Each technology has unique error modes, depending on the physico-chemical processes involved
More informationTutorial Session 2. MCMC for the analysis of genetic data on pedigrees:
MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation
More informationMicrosatellite evolution in Adélie penguins
Microsatellite evolution in Adélie penguins Bennet McComish School of Mathematics and Physics Microsatellites Tandem repeats of motifs up to 6bp, e.g. (AC) 6 = ACACACACACAC Length is highly polymorphic.
More informationProbability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies
Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National
More informationNCEA Level 2 Biology (91157) 2017 page 1 of 5 Assessment Schedule 2017 Biology: Demonstrate understanding of genetic variation and change (91157)
NCEA Level 2 Biology (91157) 2017 page 1 of 5 Assessment Schedule 2017 Biology: Demonstrate understanding of genetic variation and change (91157) Evidence Statement Q1 Expected coverage Merit Excellence
More informationMapping-free and Assembly-free Discovery of Inversion Breakpoints from Raw NGS Reads
1st International Conference on Algorithms for Computational Biology AlCoB 2014 Tarragona, Spain, July 1-3, 2014 Mapping-free and Assembly-free Discovery of Inversion Breakpoints from Raw NGS Reads Claire
More informationBayesian Clustering of Multi-Omics
Bayesian Clustering of Multi-Omics for Cardiovascular Diseases Nils Strelow 22./23.01.2019 Final Presentation Trends in Bioinformatics WS18/19 Recap Intermediate presentation Precision Medicine Multi-Omics
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm
More informationBias in RNA sequencing and what to do about it
Bias in RNA sequencing and what to do about it Walter L. (Larry) Ruzzo Computer Science and Engineering Genome Sciences University of Washington Fred Hutchinson Cancer Research Center Seattle, WA, USA
More informationSingle Cell Sequencing
Single Cell Sequencing Fundamental unit of life Autonomous and unique Interactive Dynamic - change over time Evolution occurs on the cellular level Robert Hooke s drawing of cork cells, 1665 Type Prokaryotes
More informationSupporting Information
Supporting Information Hammer et al. 10.1073/pnas.1109300108 SI Materials and Methods Two-Population Model. Estimating demographic parameters. For each pair of sub-saharan African populations we consider
More informationWhole-genome amplification in doubledigest RADseq results in adequate libraries but fewer sequenced loci
Whole-genome amplification in doubledigest RADseq results in adequate libraries but fewer sequenced loci Bruno A. S. de Medeiros and Brian D. Farrell Department of Organismic and Evolutionary Biology and
More informationNature Biotechnology: doi: /nbt Supplementary Figure 1. Detailed overview of the primer-free full-length SSU rrna library preparation.
Supplementary Figure 1 Detailed overview of the primer-free full-length SSU rrna library preparation. Detailed overview of the primer-free full-length SSU rrna library preparation. Supplementary Figure
More informationOn the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem
On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem Paola Bonizzoni, Riccardo Dondi, Gunnar W. Klau, Yuri Pirola, Nadia Pisanti and Simone Zaccaria DISCo, computer
More informationGenotype Imputation. Class Discussion for January 19, 2016
Genotype Imputation Class Discussion for January 19, 2016 Intuition Patterns of genetic variation in one individual guide our interpretation of the genomes of other individuals Imputation uses previously
More informationFriday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo
Friday Harbor 2017 From Genetics to GWAS (Genome-wide Association Study) Sept 7 2017 David Fardo Purpose: prepare for tomorrow s tutorial Genetic Variants Quality Control Imputation Association Visualization
More informationSoyBase, the USDA-ARS Soybean Genetics and Genomics Database
SoyBase, the USDA-ARS Soybean Genetics and Genomics Database David Grant Victoria Carollo Blake Steven B. Cannon Kevin Feeley Rex T. Nelson Nathan Weeks SoyBase Site Map and Navigation Video Tutorials:
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities
More informationAutomated Illumina TruSight HLA v2 Sequencing Panel Library Preparation with the epmotion 5075t
SHORT PROTOCOL No. 41 Automated Illumina TruSight HLA v2 Sequencing Panel Library Preparation with the epmotion 5075t Introduction This protocol describes the workstation configuration and pre-programmed
More informationUNIT 8 BIOLOGY: Meiosis and Heredity Page 148
UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 CP: CHAPTER 6, Sections 1-6; CHAPTER 7, Sections 1-4; HN: CHAPTER 11, Section 1-5 Standard B-4: The student will demonstrate an understanding of the molecular
More informationChapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)
12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²
More informationEvolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites
Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites Paper by: James P. Balhoff and Gregory A. Wray Presentation by: Stephanie Lucas Reviewed
More informationA DNA Sequence 2017/12/6 1
A DNA Sequence ccgtacgtacgtagagtgctagtctagtcgtagcgccgtagtcgatcgtgtgg gtagtagctgatatgatgcgaggtaggggataggatagcaacagatgagc ggatgctgagtgcagtggcatgcgatgtcgatgatagcggtaggtagacttc gcgcataaagctgcgcgagatgattgcaaagragttagatgagctgatgcta
More informationComparing whole genomes
BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationEasy Illumina Nextera DNA FLEX Library Preparation using the epmotion 5075t automated liquid handler
WHITE PAPER No. 13 Easy Illumina Nextera DNA FLEX Library Preparation using the epmotion 5075t automated liquid handler Executive Summary Library preparation steps, including DNA extraction, quantification,
More informationTaxonomy. Content. How to determine & classify a species. Phylogeny and evolution
Taxonomy Content Why Taxonomy? How to determine & classify a species Domains versus Kingdoms Phylogeny and evolution Why Taxonomy? Classification Arrangement in groups or taxa (taxon = group) Nomenclature
More informationGenotyping-by-sequencing provides the discriminating power to investigate the subspecies of Daucus carota (Apiaceae)
Arbizu et al. BMC Evolutionary Biology (2016) 16:234 DOI 10.1186/s12862-016-0806-x RESEARCH ARTICLE Open Access Genotyping-by-sequencing provides the discriminating power to investigate the subspecies
More informationCONGEN Population structure and evolutionary histories
CONGEN Population structure and evolutionary histories The table below shows allele counts at a microsatellite locus genotyped in 12 populations of Atlantic salmon. Review the table and prepare to discuss
More informationRNA- seq read mapping
RNA- seq read mapping Pär Engström SciLifeLab RNA- seq workshop October 216 IniDal steps in RNA- seq data processing 1. Quality checks on reads 2. Trim 3' adapters (opdonal (for species with a reference
More informationMassHunter TOF/QTOF Users Meeting
MassHunter TOF/QTOF Users Meeting 1 Qualitative Analysis Workflows Workflows in Qualitative Analysis allow the user to only see and work with the areas and dialog boxes they need for their specific tasks
More informationMultivariate analysis of genetic data an introduction
Multivariate analysis of genetic data an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Population genomics in Lausanne 23 Aug 2016 1/25 Outline Multivariate
More informationOrthologs Detection and Applications
Orthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 1 / 25 Table of contents 1
More informationAmplicon Sequencing. Dr. Orla O Sullivan SIRG Research Fellow Teagasc
Amplicon Sequencing Dr. Orla O Sullivan SIRG Research Fellow Teagasc What is Amplicon Sequencing? Sequencing of target genes (are regions of ) obtained by PCR using gene specific primers. Why do we do
More informationStochastic processes and
Stochastic processes and Markov chains (part II) Wessel van Wieringen w.n.van.wieringen@vu.nl wieringen@vu nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University
More informationPedigree and genomic evaluation of pigs using a terminal cross model
66 th EAAP Annual Meeting Warsaw, Poland Pedigree and genomic evaluation of pigs using a terminal cross model Tusell, L., Gilbert, H., Riquet, J., Mercat, M.J., Legarra, A., Larzul, C. Project funded by:
More informationIntroduction to de novo RNA-seq assembly
Introduction to de novo RNA-seq assembly Introduction Ideal day for a molecular biologist Ideal Sequencer Any type of biological material Genetic material with high quality and yield Cutting-Edge Technologies
More informationMultiple Change-Point Detection and Analysis of Chromosome Copy Number Variations
Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem
More informationBTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014
BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 Homework 4 (version 3) - posted October 3 Assigned October 2; Due 11:59PM October 9 Problem 1 (Easy) a. For the genetic regression model: Y
More informationPhylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign
Phylogenomics, Multiple Sequence Alignment, and Metagenomics Tandy Warnow University of Illinois at Urbana-Champaign Phylogeny (evolutionary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the
More information1 Introduction. Abstract
CBS 530 Assignment No 2 SHUBHRA GUPTA shubhg@asu.edu 993755974 Review of the papers: Construction and Analysis of a Human-Chimpanzee Comparative Clone Map and Intra- and Interspecific Variation in Primate
More informationLearning ancestral genetic processes using nonparametric Bayesian models
Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew
More informationLecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013
Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 013 1 Estimation of Var(A) and Breeding Values in General Pedigrees The classic
More informationIntroduction to the SNP/ND concept - Phylogeny on WGS data
Introduction to the SNP/ND concept - Phylogeny on WGS data Johanne Ahrenfeldt PhD student Overview What is Phylogeny and what can it be used for Single Nucleotide Polymorphism (SNP) methods CSI Phylogeny
More information"Omics" - Experimental Approachs 11/18/05
"Omics" - Experimental Approachs Bioinformatics Seminars "Omics" Experimental Approaches Nov 18 Fri 12:10 BCB Seminar in E164 Lago Using P-Values for the Planning and Analysis of Microarray Experiments
More informationSpecies Tree Inference using SVDquartets
Species Tree Inference using SVDquartets Laura Kubatko and Dave Swofford May 19, 2015 Laura Kubatko SVDquartets May 19, 2015 1 / 11 SVDquartets In this tutorial, we ll discuss several different data types:
More informationWhole genome sequencing (WGS) - there s a new tool in town. Henrik Hasman DTU - Food
Whole genome sequencing (WGS) - there s a new tool in town Henrik Hasman DTU - Food Welcome to the NGS world TODAY Welcome Introduction to Next Generation Sequencing DNA purification (Hands-on) Lunch (Sandwishes
More informationChapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype.
Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype. please read pages 38-47; 49-55;57-63. Slide 1 of Chapter 2 1 Extension sot Mendelian Behavior of Genes Single gene inheritance
More informationLearning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study
Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study Rui Wang, Yong Li, XiaoFeng Wang, Haixu Tang and Xiaoyong Zhou Indiana University at Bloomington
More informationI519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB
I519 Introduction to Bioinformatics, 2011 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism
More informationHow to analyze many contingency tables simultaneously?
How to analyze many contingency tables simultaneously? Thorsten Dickhaus Humboldt-Universität zu Berlin Beuth Hochschule für Technik Berlin, 31.10.2012 Outline Motivation: Genetic association studies Statistical
More informationMaize Genetics Cooperation Newsletter Vol Derkach 1
Maize Genetics Cooperation Newsletter Vol 91 2017 Derkach 1 RELATIONSHIP BETWEEN MAIZE LANCASTER INBRED LINES ACCORDING TO SNP-ANALYSIS Derkach K. V., Satarova T. M., Dzubetsky B. V., Borysova V. V., Cherchel
More informationObjectives. Announcements. Comparison of mitosis and meiosis
Announcements Colloquium sessions for which you can get credit posted on web site: Feb 20, 27 Mar 6, 13, 20 Apr 17, 24 May 15. Review study CD that came with text for lab this week (especially mitosis
More informationIntroduction to population genetics & evolution
Introduction to population genetics & evolution Course Organization Exam dates: Feb 19 March 1st Has everybody registered? Did you get the email with the exam schedule Summer seminar: Hot topics in Bioinformatics
More informationAssembly improvement: based on Ragout approach. student: Anna Lioznova scientific advisor: Son Pham
Assembly improvement: based on Ragout approach student: Anna Lioznova scientific advisor: Son Pham Plan Ragout overview Datasets Assembly improvements Quality overlap graph paired-end reads Coverage Plan
More information