Explore SNP polymorphism data. A. Dereeper, Y. Hueber

Size: px
Start display at page:

Download "Explore SNP polymorphism data. A. Dereeper, Y. Hueber"

Transcription

1 Explore SNP polymorphism data A. Dereeper, Y. Hueber Bioinformatics trainings, Supagro, February, 2016

2 Tablet Graphical tool to visualize assemblies Accept many formats ACE, SAM, BAM

3 GATK (Genome Analysis ToolKit) So>ware package to analyse NGS data. Implemented to analyse human resequencing data, for medical purpose (1000 genomes, The Cancer Genome Atlas) Includes depth analyses, quality score recalibralon, SNP/InDel deteclon Complementary with other packages: SamTools, PicardTools, VCFtools, BEDtools PREPROCESS: * Index human genome (Picard), we used HG18 from UCSC. * Convert Illumina reads to Fastq format * Convert Illumina 1.6 read quality scores to standard Sanger scores FOR EACH SAMPLE: 1. Align samples to genome (BWA), generates SAI files. 2. Convert SAI to SAM (BWA) 3. Convert SAM to BAM binary format (SAM Tools) 4. Sort BAM (SAM Tools) 5. Index BAM (SAM Tools) 6. Identify target regions for realignment (Genome Analysis Toolkit) 7. Realign BAM to get better Indel calling (Genome Analysis Toolkit) 8. Reindex the realigned BAM (SAM Tools) 9. Call Indels (Genome Analysis Toolkit) 10. Call SNPs (Genome Analysis Toolkit) 11. View aligned reads in BAM/BAI (Integrated Genome Viewer)

4 Fastq (RC1) Fastq (RC2) Fastq (RC3) Fastq (RC4) Cutadapt Mapping BWA Cutadapt Cutadapt Cutadapt Mapping BWA Mapping BWA Mapping BWA. Add or Replace Groups Add or Replace Groups Add or Replace Groups Add or Replace Groups BAM with read group BAM with read group BAM with read group BAM with read group mergesam Global BAM with read group VCF file

5 For GBS data Tassel pipeline Version 5 TASSEL- GBS Plos One, 2014

6 GBS RAD- Seq RNA- Seq WGRS Reads pre- processing and mapping + SNP Calling and genotype assignalon Tassel Galaxy workflow Genotyping data Storage and mining Genotyping data analyses and visualizalon (GWAS, diversity )

7 Format VCF (Variant Call Format) Advantages: VariaLon descriplon for each posilon + genotype assignalons Indexed flat files. Binary files also exist: BCF format

8 Format Pileup - Another format for variant calling (generated by samtools) - Describe alignment row by row (not line by line like in SAM format) - Used by so>wares such as Varscan (varscan pileup2snp) - Frequently used for rare variants, with a low frequency (e.g. viral pop)

9 Autres Other GATK fonctionalités functionalities GATK Module DepthOfCoverage: Allows to get sequencing depth for each gene, each posilon and each individual Module ReadBackedPhasing: Allows to set, if possible, associalons between alleles (phase and haplotypes) when we are in an heterozygote situalon. Et non AGG GGA

10 Haplotypes and phasing Haplotype: Specific groups of genes or alleles that progeny inherited from one parent Phasing: DeterminaLon of haplotype phase. Process of stalslcal eslmalon of haplotypes from genotype data. Can be infered by stalslcs methods using non- ambigous haplotypes present in the dataset (Gevalt, ShapeIT, Phase) Can be resolved using physical associalon of alleles within the reads (GATK ReadBackedPhasing, GATK HaplotypeCaller)

11 Projet Gigwa, pour la gestion des données massives de variants (GBS, RADSeq, WGRS) «With NGS arise serious computalonal challenges in terms of storage, search, sharing, analysis, and data visualizalon, that redefine some praclces in data management.» - Based on NoSQL technology - Handles VCF files (Variant Call Format) and annotalons - Supports mullple variant types: SNPs, InDels, SSRs, SV - Powerful genotyping queries - Easily scalable with MongoDB sharding - Transparent access - Takes phasing informalon into account when imporlng/exporlng in VCF format

12 hep://gigwa.southgreen.fr/gigwa/

13 SNiPlay: Web application for polymorphism analyses hep://sniplay.southgreen.fr

14 IFB project Galaxy4Sniplay (WP4 IFB, Plant node)

15 Upload a VCF file in SNiPlay hep://sniplay.southgreen.fr Upload a VCF file (+ reference if not available in genome collec@on) Select rice genome The reference corresponce to mrna A. Dereeper, Y. Hueber Bioinformatics trainings, Supagro, February, 2016

16 Filters using VCFtools or Gigwa Maf Missing data AnnotaLon PosiLon

17 SNP annotation using SnpEff It annotates and predicts the effects of variants on genes (amino acid changes ) Uses as input GFF annoalon file and VCF

18

19 structure (snmf, Admixture) Test different values of K (es.mates the probability (likelihood tests) that samples are structured in K popula.ons) For the best value of K, the applicalon shows Q eslmates for each individual (admixture percent) (probability that the individual belongs to each popula.on)

20 MDS (MulL- Dimensional Scale) plot SNP- based Distance tree with FastME

21 Pi: NucleoLde diversity: Average number of nucleolde differences per site between any two DNA sequences chosen randomly from the sample populalon Used to measure the degree of polymorphism within a populalon Diversity analysis Comparison between individuals Fst: FixaLon index: measure of populalon differenlalon due to genelc structure.

22 SNP density by individuals can allow the deteclon of introgression event. Introgression = Movement of a exogene region (gene flow) from one species into the gene pool of another by the repeated backcrossing of an interspecific hybrid with one of its parent species Widely used in agronomy obtained but can occurs naturally

23 Haplotypes Haplotype reconstruclon using Gevalt Network with Haplophyle Available only for regions presenlng few variants (short regions, genes) Exploit phased VCF (in progress ) High frequency haplotypes Group distribution whithin this haplotype Low frequency haplotype Distance between 2 haplotypes (nb of mutations)

24 GWAS (Genome-Wide Association Studies) EsLmate associalon between a marker and a phenotypic character Manhaean plots: displays GWAS stalslcal tests (- log10 pvalue) along chromosomes TASSEL, MLMM sofwares False posilves because of the studied structuralon panel => correclon using structure populalon et and kinship

25 GWAS issues Choice of genotypic panel: phenotypic diversity for target traits must be sufficient (core- colleclon, MAGIC lines, NAM ) Popula@on structure induces high rates of false associalons (false posilves) CorrecLon using structure populalon et and kinship. Mixed models: o Q o K (widely used) o Q+K (widely used) Density of markers must be enough to provide a good genome cover. Density can be also highly variable. Linkage disequilibrium (LD) landscape: level of intra- and inter- chromosomal LD (number of loci in LD with loci from other chromosomes). Ideally, LD profile must be flat to avoid distorsion in associalon paeerns.

26 TD: Study of root characters using GWAS in Oryza sativa japonica. Influence of a correction using structure and kinship

27 Relatedness between individuals (kinship matrix) TASSEL and plink so>wares EsLmaLon of relatedness between individuals using a distance matrix

Accounting for read depth in the analysis of genotyping-by-sequencing data

Accounting for read depth in the analysis of genotyping-by-sequencing data Accounting for read depth in the analysis of genotyping-by-sequencing data Ken Dodds, John McEwan, Timothy Bilton, Rudi Brauning, Rayna Anderson, Tracey Van Stijn, Theodor Kristjánsson, Shannon Clarke

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:

More information

Introduction to PLINK H3ABionet Course Covenant University, Nigeria

Introduction to PLINK H3ABionet Course Covenant University, Nigeria UNIVERSITY OF THE WITWATERSRAND, JOHANNESBURG Introduction to PLINK H3ABionet Course Covenant University, Nigeria Scott Hazelhurst H3ABioNet funded by NHGRI grant number U41HG006941 Wits Bioinformatics

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

Variant visualisation and quality control

Variant visualisation and quality control Variant visualisation and quality control You really should be making plots! 25/06/14 Paul Theodor Pyl 1 Classical Sequencing Example DNA.BAM.VCF Aligner Variant Caller A single sample sequencing run 25/06/14

More information

Supplementary Methods and Figures

Supplementary Methods and Figures Whole-genome resequencing of honeybee drones to detect genomic selection in a population managed for royal jelly David Wragg 1*, Maria Marti 1, Benjamin Basso 2, Jean-Pierre Bidanel 3, Emmanuelle Labarthe

More information

Genetic diversity and population structure in rice. S. Kresovich 1,2 and T. Tai 3,5. Plant Breeding Dept, Cornell University, Ithaca, NY

Genetic diversity and population structure in rice. S. Kresovich 1,2 and T. Tai 3,5. Plant Breeding Dept, Cornell University, Ithaca, NY Genetic diversity and population structure in rice S. McCouch 1, A. Garris 1,2, J. Edwards 1, H. Lu 1,3 M Redus 4, J. Coburn 1, N. Rutger 4, S. Kresovich 1,2 and T. Tai 3,5 1 Plant Breeding Dept, Cornell

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

Processes of Evolution

Processes of Evolution 15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection

More information

Microsatellite data analysis. Tomáš Fér & Filip Kolář

Microsatellite data analysis. Tomáš Fér & Filip Kolář Microsatellite data analysis Tomáš Fér & Filip Kolář Multilocus data dominant heterozygotes and homozygotes cannot be distinguished binary biallelic data (fragments) presence (dominant allele/heterozygote)

More information

Sta$s$cal Physics, Inference and Applica$ons to Biology

Sta$s$cal Physics, Inference and Applica$ons to Biology Sta$s$cal Physics, Inference and Applica$ons to Biology Physics Department, Ecole Normale Superieure, Paris, France. Simona Cocco Office:GH301 mail:cocco@lps.ens.fr Deriving Protein Structure and Func$on

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

Fei Lu. Post doctoral Associate Cornell University

Fei Lu. Post doctoral Associate Cornell University Fei Lu Post doctoral Associate Cornell University http://www.maizegenetics.net Genotyping by sequencing (GBS) is simple and cost effective 1. Digest DNA 2. Ligate adapters with barcodes 3. Pool DNAs 4.

More information

Predictive Genome Analysis Using Partial DNA Sequencing Data

Predictive Genome Analysis Using Partial DNA Sequencing Data Predictive Genome Analysis Using Partial DNA Sequencing Data Nauman Ahmed, Koen Bertels and Zaid Al-Ars Computer Engineering Lab, Delft University of Technology, Delft, The Netherlands {n.ahmed, k.l.m.bertels,

More information

Bayesian Inference of Interactions and Associations

Bayesian Inference of Interactions and Associations Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,

More information

Comparative Genomics of Fagaceae

Comparative Genomics of Fagaceae Fagaceae Images.google.com Linkage Map www.quia.com TM www.clipartlord.com Selection of mapping parents SM2 SM1 Predominant pollinator? Progeny Exclusion for Full Sib Linkage Mapping Year Acorns genotyped

More information

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) 12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

opulation genetics undamentals for SNP datasets

opulation genetics undamentals for SNP datasets opulation genetics undamentals for SNP datasets with crocodiles) Sam Banks Charles Darwin University sam.banks@cdu.edu.au I ve got a SNP genotype dataset, now what? Do my data meet the requirements of

More information

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES: .5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the

More information

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype. Series 2: Cross Diagrams - Complementation There are two alleles for each trait in a diploid organism In C. elegans gene symbols are ALWAYS italicized. To represent two different genes on the same chromosome:

More information

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012 Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012 Last Time Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities

More information

Learning gene regulatory networks Statistical methods for haplotype inference Part I

Learning gene regulatory networks Statistical methods for haplotype inference Part I Learning gene regulatory networks Statistical methods for haplotype inference Part I Input: Measurement of mrn levels of all genes from microarray or rna sequencing Samples (e.g. 200 patients with lung

More information

High-throughput sequencing: Alignment and related topic

High-throughput sequencing: Alignment and related topic High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg HTS Platforms E s ta b lis h e d p la tfo rm s Illu m in a H is e q, A B I S O L id, R o c h e 4 5 4 N e w c o m e rs

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

GENETICS - CLUTCH CH.1 INTRODUCTION TO GENETICS.

GENETICS - CLUTCH CH.1 INTRODUCTION TO GENETICS. !! www.clutchprep.com CONCEPT: HISTORY OF GENETICS The earliest use of genetics was through of plants and animals (8000-1000 B.C.) Selective breeding (artificial selection) is the process of breeding organisms

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

Research Statement on Statistics Jun Zhang

Research Statement on Statistics Jun Zhang Research Statement on Statistics Jun Zhang (junzhang@galton.uchicago.edu) My interest on statistics generally includes machine learning and statistical genetics. My recent work focus on detection and interpretation

More information

Evolution of phenotypic traits

Evolution of phenotypic traits Quantitative genetics Evolution of phenotypic traits Very few phenotypic traits are controlled by one locus, as in our previous discussion of genetics and evolution Quantitative genetics considers characters

More information

Learning ancestral genetic processes using nonparametric Bayesian models

Learning ancestral genetic processes using nonparametric Bayesian models Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew

More information

Browsing Genomic Information with Ensembl Plants

Browsing Genomic Information with Ensembl Plants Browsing Genomic Information with Ensembl Plants Etienne de Villiers, PhD (Adapted from slides by Bert Overduin EMBL-EBI) Outline of workshop Brief introduction to Ensembl Plants History Content Tutorial

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

Package KMgene. November 22, 2017

Package KMgene. November 22, 2017 Type Package Package KMgene November 22, 2017 Title Gene-Based Association Analysis for Complex Traits Version 1.2 Author Qi Yan Maintainer Qi Yan Gene based association test between a

More information

New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype)

New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype) New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype) Kelly Swarts PAG Allele Mining 1/11/2014 Imputation is the projection

More information

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency Bruce Walsh lecture notes Introduction to Quantitative Genetics SISG, Seattle 16 18 July 2018 1 Outline Genetics of complex

More information

Detecting selection from differentiation between populations: the FLK and hapflk approach.

Detecting selection from differentiation between populations: the FLK and hapflk approach. Detecting selection from differentiation between populations: the FLK and hapflk approach. Bertrand Servin bservin@toulouse.inra.fr Maria-Ines Fariello, Simon Boitard, Claude Chevalet, Magali SanCristobal,

More information

RNA- seq read mapping

RNA- seq read mapping RNA- seq read mapping Pär Engström SciLifeLab RNA- seq workshop October 216 IniDal steps in RNA- seq data processing 1. Quality checks on reads 2. Trim 3' adapters (opdonal (for species with a reference

More information

BIOLOGY STANDARDS BASED RUBRIC

BIOLOGY STANDARDS BASED RUBRIC BIOLOGY STANDARDS BASED RUBRIC STUDENTS WILL UNDERSTAND THAT THE FUNDAMENTAL PROCESSES OF ALL LIVING THINGS DEPEND ON A VARIETY OF SPECIALIZED CELL STRUCTURES AND CHEMICAL PROCESSES. First Semester Benchmarks:

More information

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion

More information

Supplementary Information for Discovery and characterization of indel and point mutations

Supplementary Information for Discovery and characterization of indel and point mutations Supplementary Information for Discovery and characterization of indel and point mutations using DeNovoGear Avinash Ramu 1 Michiel J. Noordam 1 Rachel S. Schwartz 2 Arthur Wuster 3 Matthew E. Hurles 3 Reed

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Lecture WS Evolutionary Genetics Part I 1

Lecture WS Evolutionary Genetics Part I 1 Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

A mixed model based QTL / AM analysis of interactions (G by G, G by E, G by treatment) for plant breeding

A mixed model based QTL / AM analysis of interactions (G by G, G by E, G by treatment) for plant breeding Professur Pflanzenzüchtung Professur Pflanzenzüchtung A mixed model based QTL / AM analysis of interactions (G by G, G by E, G by treatment) for plant breeding Jens Léon 4. November 2014, Oulu Workshop

More information

Unit 3 - Molecular Biology & Genetics - Review Packet

Unit 3 - Molecular Biology & Genetics - Review Packet Name Date Hour Unit 3 - Molecular Biology & Genetics - Review Packet True / False Questions - Indicate True or False for the following statements. 1. Eye color, hair color and the shape of your ears can

More information

Nature Genetics: doi: /ng Supplementary Figure 1. The phenotypes of PI , BR121, and Harosoy under short-day conditions.

Nature Genetics: doi: /ng Supplementary Figure 1. The phenotypes of PI , BR121, and Harosoy under short-day conditions. Supplementary Figure 1 The phenotypes of PI 159925, BR121, and Harosoy under short-day conditions. (a) Plant height. (b) Number of branches. (c) Average internode length. (d) Number of nodes. (e) Pods

More information

Supplemental Information

Supplemental Information Molecular Cell, Volume 52 Supplemental Information The Translational Landscape of the Mammalian Cell Cycle Craig R. Stumpf, Melissa V. Moreno, Adam B. Olshen, Barry S. Taylor, and Davide Ruggero Supplemental

More information

Eiji Yamamoto 1,2, Hiroyoshi Iwata 3, Takanari Tanabata 4, Ritsuko Mizobuchi 1, Jun-ichi Yonemaru 1,ToshioYamamoto 1* and Masahiro Yano 5,6

Eiji Yamamoto 1,2, Hiroyoshi Iwata 3, Takanari Tanabata 4, Ritsuko Mizobuchi 1, Jun-ichi Yonemaru 1,ToshioYamamoto 1* and Masahiro Yano 5,6 Yamamoto et al. BMC Genetics 2014, 15:50 METHODOLOGY ARTICLE Open Access Effect of advanced intercrossing on genome structure and on the power to detect linked quantitative trait loci in a multi-parent

More information

MOLECULAR MAPS AND MARKERS FOR DIPLOID ROSES

MOLECULAR MAPS AND MARKERS FOR DIPLOID ROSES MOLECULAR MAPS AND MARKERS FOR DIPLOID ROSES Patricia E Klein, Mandy Yan, Ellen Young, Jeekin Lau, Stella Kang, Natalie Patterson, Natalie Anderson and David Byrne Department of Horticultural Sciences,

More information

Our typical RNA quantification pipeline

Our typical RNA quantification pipeline RNA-Seq primer Our typical RNA quantification pipeline Upload your sequence data (fastq) Align to the ribosome (Bow>e) Align remaining reads to genome (TopHat) or transcriptome (RSEM) Make report of quality

More information

Classical Selection, Balancing Selection, and Neutral Mutations

Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection Perspective of the Fate of Mutations All mutations are EITHER beneficial or deleterious o Beneficial mutations are selected

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

2. Map genetic distance between markers

2. Map genetic distance between markers Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,

More information

Population genomic scans suggest novel genes underlie convergent flowering time evolution in the introduced range of Arabidopsis thaliana

Population genomic scans suggest novel genes underlie convergent flowering time evolution in the introduced range of Arabidopsis thaliana Molecular Ecology (2017) 26, 92 106 doi: 10.1111/mec.13643 SPECIAL ISSUE: THE MOLECULAR MECHANISMS OF ADAPTATION AND SPECIATION: INTEGRATING GENOMIC AND MOLECULAR APPROACHES Population genomic scans suggest

More information

Bayesian inference of ancient human demography from individual genome sequences

Bayesian inference of ancient human demography from individual genome sequences Bayesian inference of ancient human demography from individual genome sequences Ilan Gronau, Melissa J. Hubisz, Brad Gulko, Charles G. Danko, Adam Siepel SUPPLEMENTARY INFORMATION Supplementary Figures

More information

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M. STAT 550 Howework 6 Anton Amirov 1. This question relates to the same study you saw in Homework-4, by Dr. Arno Motulsky and coworkers, and published in Thompson et al. (1988; Am.J.Hum.Genet, 42, 113-124).

More information

SoyBase, the USDA-ARS Soybean Genetics and Genomics Database

SoyBase, the USDA-ARS Soybean Genetics and Genomics Database SoyBase, the USDA-ARS Soybean Genetics and Genomics Database David Grant Victoria Carollo Blake Steven B. Cannon Kevin Feeley Rex T. Nelson Nathan Weeks SoyBase Site Map and Navigation Video Tutorials:

More information

Heterozygous BMN lines

Heterozygous BMN lines Optical density at 80 hours 0.8 0.6 0.4 0.2 0.8 0.6 0.4 0.2 0.8 0.6 0.4 0.2 0.8 0.6 0.4 0.2 a YPD b YPD + 1µM nystatin c YPD + 2µM nystatin d YPD + 4µM nystatin 1 3 5 6 9 13 16 20 21 22 23 25 28 29 30

More information

Araport, a community portal for Arabidopsis. Data integration, sharing and reuse. sergio contrino University of Cambridge

Araport, a community portal for Arabidopsis. Data integration, sharing and reuse. sergio contrino University of Cambridge Araport, a community portal for Arabidopsis. Data integration, sharing and reuse sergio contrino University of Cambridge Acknowledgements J Craig Venter Institute Chris Town Agnes Chan Vivek Krishnakumar

More information

Cycle «Analyse de données de séquençage à haut-débit»

Cycle «Analyse de données de séquençage à haut-débit» Cycle «Analyse de données de séquençage à haut-débit» Module 1/5 Analyse ADN Chadi Saad CRIStAL - Équipe BONSAI - Univ Lille, CNRS, INRIA (chadi.saad@univ-lille.fr) Présentation de Sophie Gallina (source:

More information

Evaluating allopolyploid origins in strawberries (Fragaria) using haplotypes generated from target capture sequencing

Evaluating allopolyploid origins in strawberries (Fragaria) using haplotypes generated from target capture sequencing Kamneva et al. BMC Evolutionary Biology (2017) 17:180 DOI 10.1186/s12862-017-1019-7 RESEARCH ARTICLE Evaluating allopolyploid origins in strawberries (Fragaria) using haplotypes generated from target capture

More information

GTRAC FAST R ETRIEVAL FROM C OMPRESSED C OLLECTIONS OF G ENOMIC VARIANTS. Kedar Tatwawadi Mikel Hernaez Idoia Ochoa Tsachy Weissman

GTRAC FAST R ETRIEVAL FROM C OMPRESSED C OLLECTIONS OF G ENOMIC VARIANTS. Kedar Tatwawadi Mikel Hernaez Idoia Ochoa Tsachy Weissman GTRAC FAST R ETRIEVAL FROM C OMPRESSED C OLLECTIONS OF G ENOMIC VARIANTS Kedar Tatwawadi Mikel Hernaez Idoia Ochoa Tsachy Weissman Overview Introduction Results Algorithm Details Summary & Further Work

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

GBS Bioinformatics Pipeline(s) Overview

GBS Bioinformatics Pipeline(s) Overview GBS Bioinformatics Pipeline(s) Overview Getting from sequence files to genotypes. Pipeline Coding: Ed Buckler Jeff Glaubitz James Harriman Presentation: Rob Elshire With supporting information from the

More information

Solutions to Problem Set 4

Solutions to Problem Set 4 Question 1 Solutions to 7.014 Problem Set 4 Because you have not read much scientific literature, you decide to study the genetics of garden peas. You have two pure breeding pea strains. One that is tall

More information

Microevolution (Ch 16) Test Bank

Microevolution (Ch 16) Test Bank Microevolution (Ch 16) Test Bank Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. Which of the following statements describes what all members

More information

p(d g A,g B )p(g B ), g B

p(d g A,g B )p(g B ), g B Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)

More information

Graduate Funding Information Center

Graduate Funding Information Center Graduate Funding Information Center UNC-Chapel Hill, The Graduate School Graduate Student Proposal Sponsor: Program Title: NESCent Graduate Fellowship Department: Biology Funding Type: Fellowship Year:

More information

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:

More information

Managing segregating populations

Managing segregating populations Managing segregating populations Aim of the module At the end of the module, we should be able to: Apply the general principles of managing segregating populations generated from parental crossing; Describe

More information

Genotype Imputation. Class Discussion for January 19, 2016

Genotype Imputation. Class Discussion for January 19, 2016 Genotype Imputation Class Discussion for January 19, 2016 Intuition Patterns of genetic variation in one individual guide our interpretation of the genomes of other individuals Imputation uses previously

More information

GBS Bioinformatics Pipeline(s) Overview

GBS Bioinformatics Pipeline(s) Overview GBS Bioinformatics Pipeline(s) Overview Getting from sequence files to genotypes. Pipeline Coding: Ed Buckler Jeff Glaubitz James Harriman Presentation: Terry Casstevens With supporting information from

More information

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics: Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships

More information

Linkage and Linkage Disequilibrium

Linkage and Linkage Disequilibrium Linkage and Linkage Disequilibrium Summer Institute in Statistical Genetics 2014 Module 10 Topic 3 Linkage in a simple genetic cross Linkage In the early 1900 s Bateson and Punnet conducted genetic studies

More information

Genotyping By Sequencing (GBS) Method Overview

Genotyping By Sequencing (GBS) Method Overview enotyping By Sequencing (BS) Method Overview RJ Elshire, JC laubitz, Q Sun, JV Harriman ES Buckler, and SE Mitchell http://wwwmaizegeneticsnet/ Topics Presented Background/oals BS lab protocol Illumina

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study

Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study Rui Wang, Yong Li, XiaoFeng Wang, Haixu Tang and Xiaoyong Zhou Indiana University at Bloomington

More information

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype. Series 1: Cross Diagrams There are two alleles for each trait in a diploid organism In C. elegans gene symbols are ALWAYS italicized. To represent two different genes on the same chromosome: When both

More information

Multivariate analysis of genetic data an introduction

Multivariate analysis of genetic data an introduction Multivariate analysis of genetic data an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Population genomics in Lausanne 23 Aug 2016 1/25 Outline Multivariate

More information

Nature Genetics: doi: /ng Supplementary Figure 1. ssp mutant phenotypes in a functional SP background.

Nature Genetics: doi: /ng Supplementary Figure 1. ssp mutant phenotypes in a functional SP background. Supplementary Figure 1 ssp mutant phenotypes in a functional SP background. (a,b) Statistical comparisons of primary and sympodial shoot flowering times as determined by mean values for leaf number on

More information

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall

More information

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo Friday Harbor 2017 From Genetics to GWAS (Genome-wide Association Study) Sept 7 2017 David Fardo Purpose: prepare for tomorrow s tutorial Genetic Variants Quality Control Imputation Association Visualization

More information

Intraspecific gene genealogies: trees grafting into networks

Intraspecific gene genealogies: trees grafting into networks Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation

More information

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013 Lecture 28: BLUP and Genomic Selection Bruce Walsh lecture notes Synbreed course version 11 July 2013 1 BLUP Selection The idea behind BLUP selection is very straightforward: An appropriate mixed-model

More information

Introduction to Linkage Disequilibrium

Introduction to Linkage Disequilibrium Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have

More information

When one gene is wild type and the other mutant:

When one gene is wild type and the other mutant: Series 2: Cross Diagrams Linkage Analysis There are two alleles for each trait in a diploid organism In C. elegans gene symbols are ALWAYS italicized. To represent two different genes on the same chromosome:

More information

On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem

On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem Paola Bonizzoni, Riccardo Dondi, Gunnar W. Klau, Yuri Pirola, Nadia Pisanti and Simone Zaccaria DISCo, computer

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Genetic and physiological approach to elucidation of Cd absorption mechanism by rice plants

Genetic and physiological approach to elucidation of Cd absorption mechanism by rice plants Genetic and physiological approach to elucidation of Cd absorption mechanism by rice plants Satoru Ishikawa National Institute for Agro-Environmental Sciences, 3-1-3, Kannondai, Tsukuba, Ibaraki, 305-8604,

More information

Principles of QTL Mapping. M.Imtiaz

Principles of QTL Mapping. M.Imtiaz Principles of QTL Mapping M.Imtiaz Introduction Definitions of terminology Reasons for QTL mapping Principles of QTL mapping Requirements For QTL Mapping Demonstration with experimental data Merit of QTL

More information

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

More information

Robust demographic inference from genomic and SNP data

Robust demographic inference from genomic and SNP data Robust demographic inference from genomic and SNP data Laurent Excoffier Isabelle Duperret, Emilia Huerta-Sanchez, Matthieu Foll, Vitor Sousa, Isabel Alves Computational and Molecular Population Genetics

More information

Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials

Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials Biostatistics (2013), pp. 1 31 doi:10.1093/biostatistics/kxt006 Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials XINYI LIN, SEUNGGUEN

More information

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,

More information

Map of AP-Aligned Bio-Rad Kits with Learning Objectives

Map of AP-Aligned Bio-Rad Kits with Learning Objectives Map of AP-Aligned Bio-Rad Kits with Learning Objectives Cover more than one AP Biology Big Idea with these AP-aligned Bio-Rad kits. Big Idea 1 Big Idea 2 Big Idea 3 Big Idea 4 ThINQ! pglo Transformation

More information

GLIDE: GPU-based LInear Detection of Epistasis

GLIDE: GPU-based LInear Detection of Epistasis GLIDE: GPU-based LInear Detection of Epistasis Chloé-Agathe Azencott with Tony Kam-Thong, Lawrence Cayton, and Karsten Borgwardt Machine Learning and Computational Biology Research Group Max Planck Institute

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

Genotyping By Sequencing (GBS) Method Overview

Genotyping By Sequencing (GBS) Method Overview enotyping By Sequencing (BS) Method Overview Sharon E Mitchell Institute for enomic Diversity Cornell University http://wwwmaizegeneticsnet/ Topics Presented Background/oals BS lab protocol Illumina sequencing

More information