Potato Genome Analysis

Similar documents
Stage 1: Karyotype Stage 2: Gene content & order Step 3

Paleo-evolutionary plasticity of plant disease resistance genes

Supplemental Figure 1. Comparisons of GC3 distribution computed with raw EST data, bi-beta fits and complete genome sequences for 6 species.

Bioinformatics tools to analyze complex genomes. Yves Van de Peer Ghent University/VIB

Supplementary Material

Supplementary Information for: The genome of the extremophile crucifer Thellungiella parvula

Eukaryotic vs. Prokaryotic genes

Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags

Supplementary Figure 3

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Genome-wide Identification of Lineage Specific Genes in Arabidopsis, Oryza and Populus

Marialaura Destefanis 1,3, Istvan Nagy 1,4, Brian Rigney 1, Glenn J Bryan 2, Karen McLean 2, Ingo Hein 2, Denis Griffin 1 and Dan Milbourne 1*

,(CL806925),(CL ),(CL829057),(CL ),(CL862603) BAC45136 putative nucleotide-binding

Impact of recurrent gene duplication on adaptation of plant genomes

Comparative genomics: Overview & Tools + MUMmer algorithm

Annotation of Plant Genomes using RNA-seq. Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA)

Computational Biology: Basics & Interesting Problems

Genome-wide discovery of G-quadruplex forming sequences and their functional

Phylogenetic Comparison of F-Box (FBX) Gene Superfamily within the Plant Kingdom Reveals Divergent Evolutionary Histories Indicative of Genomic Drift

Supplemental Figure 1. Comparison of Tiller Bud Formation between the Wild Type and d27. (A) and (B) Longitudinal sections of shoot apex in wild-type

UNIT 5. Protein Synthesis 11/22/16

Intro Gene regulation Synteny The End. Today. Gene regulation Synteny Good bye!

Lineage specific conserved noncoding sequences in plants

New insights into the evolution and functional divergence of the SWEET family in Saccharum based on comparative genomics

Evolution by duplication: paleopolyploidy events in plants reconstructed by deciphering the evolutionary history of VOZ transcription factors

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

Multiple Choice Review- Eukaryotic Gene Expression

Host_microbe_PPI - R package to analyse intra-species and interspecies protein-protein interactions in the model plant Arabidopsis thaliana

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014

Supplementary Figure 1. Number of CC- and TIR- type NBS- LRR genes and presence of mir482/2118 on sequenced plant genomes.

Genomes Comparision via de Bruijn graphs

Regulatory Change in YABBY-like Transcription Factor Led to Evolution of Extreme Fruit Size during Tomato Domestication

Systematic comparison of lncrnas with protein coding mrnas in population expression and their response to environmental change

Systematic Analysis and Comparison of Nucleotide-Binding Site Disease Resistance Genes in a Diploid Cotton Gossypium raimondii

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

Cao, J, K Schneeberger, S Ossowski, et al Whole genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet 43:

Genome-wide analysis of the MYB transcription factor superfamily in soybean

USDA-DOE Plant Feedstock Genomics for Bioenergy

Browsing Genomic Information with Ensembl Plants

Small RNA in rice genome

Variation, Evolution, and Correlation Analysis of C+G Content and Genome or Chromosome Size in Different Kingdoms and Phyla

Supplemental Table 1. Primers used for cloning and PCR amplification in this study

Introduction to de novo RNA-seq assembly

Science Unit Learning Summary

Genome sequence of Plasmopara viticola and insight into the pathogenic mechanism

Origin and diversification of leucine-rich repeat receptor-like protein kinase (LRR-RLK) genes in plants

Genome-Wide Computational Prediction and Analysis of Core Promoter Elements across Plant Monocots and Dicots

Genome-wide analysis of nucleotide-binding site disease resistance genes in Medicago truncatula

Principles of Genetics

RNA- seq read mapping

Fei Lu. Post doctoral Associate Cornell University

The Saguaro Genome. Toward the Ecological Genomics of a Sonoran Desert Icon. Dr. Dario Copetti June 30, 2015 STEMAZing workshop TCSS

A diploid somatic cell from a rat has a total of 42 chromosomes (2n = 42). As in humans, sex chromosomes determine sex: XX in females and XY in males.

Plant Genome Sequencing

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

PLAZA: A Comparative Genomics Resource to Study Gene and Genome Evolution in Plants W

Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal

Name: Period: EOC Review Part F Outline

Structure, phylogeny, allelic haplotypes and expression of sucrose transporter gene families in Saccharum

Biology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall

South Green Bioinformatics activities at CIRAD

training workshop 2015

1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine.

Exploring structural variation and gene family architecture with De Novo assemblies of 15 Medicago genomes

SUPPLEMENTARY INFORMATION

What can sequences tell us?

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

SUPPLEMENTARY INFORMATION

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

biology Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.

Non-host resistance to wheat stem rust in Brachypodium species

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

Algorithmics and Bioinformatics

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

The Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector.

The Journal of Animal & Plant Sciences, 28(5): 2018, Page: Sadia et al., ISSN:

Untitled Document. A. antibiotics B. cell structure C. DNA structure D. sterile procedures

Lecture Notes: BIOL2007 Molecular Evolution

BME 5742 Biosystems Modeling and Control

Supplementary Tables and Figures

Introduction to Bioinformatics

Lecture 3: A basic statistical concept

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre

Whole Genome Alignment. Adam Phillippy University of Maryland, Fall 2012

Computational Structural Bioinformatics

Adaptation in the Human Genome. HapMap. The HapMap is a Resource for Population Genetic Studies. Single Nucleotide Polymorphism (SNP)

CGS 5991 (2 Credits) Bioinformatics Tools

SUPPLEMENTARY INFORMATION

Student Learning Outcomes: Nucleus distinguishes Eukaryotes from Prokaryotes

GENOME DUPLICATION AND GENE ANNOTATION: AN EXAMPLE FOR A REFERENCE PLANT SPECIES.

Full file at CHAPTER 2 Genetics

SUPPLEMENTARY MATERIAL SUPPLEMENTARY TABLES

Comparative genomics. Lucy Skrabanek ICB, WMC 6 May 2008

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18

SUPPLEMENTARY INFORMATION

Genotyping By Sequencing (GBS) Method Overview

Introduction to Molecular and Cell Biology

Transcription:

Potato Genome Analysis Xin Liu Deputy director BGI research 2016.1.21 WCRTC 2016 @ Nanning

Reference genome construction???????????????????????????????????????? Sequencing HELL RIEND WELCOME BGI ZHEN LLOFRI DSWEL METOBG HENZH HELLOF SWEL METO GISHEN ELLOFR DSW COM OBGI ENZHEN OFIIEN WELCOM GISH NZHEN Assemble HELLO FRIENDS WELCOME TO BGISHENZHEN

Second generation sequencing for assembly Construct libraries with hierarchical insert-sizes; 250bp, 500bp, 800bp, 2kb, 5kb, 10kb, 20kb, 40kb Sequence the libraries; 60X genome coverage; De novo assembly Annotation and evolutionary analysis

Genome survey 1. 30X data 2. K-mer analysis 3. Preliminary assembly 4. Heterozygosity simulation analysis 5. GC depth distribution analysis 1.Genome size 2.Heterozygosity rate 3.GC content 4.Repeat sequence proportion

The potato genome Would provide important resource for crop improvement

Information of potato genome Autotetraploid (2n=4x=48) Highly heterozygous Heterozygous diploid available Double haplotype available Different dataset available Genome size: 850 Mb

Sample selection DM1-3 516 R44 (DM) resulted from chromosome doubling of a monoploid (1n=1x=12) derived by anther culture of a heterozygous diploid (2n=2x=24) S. tuberosum group Phureja clone (PI 225669).

Heterozygosity affecting genome assembly Heterozygosity would result in breakdown of the assembly. Rei Kajitani, Kouta Toshimoto, Hideki Noguchi, et al.

Assess the genome 33,761,617,031 bases Peak at 40 Genome size estimated to be: 844 Mb S. tuberosum group Phureja DM1-3 516 R44

The heterozygous diploid S. tuberosum group Tuberosum RH89-039-16

Assessing the heterozygosity

Assemble the DM genome: data

The potato genome assembly 727 Mb, 6.1% Ns/gaps, 86% of the genome N90 349 kb, 443 super scaffolds a: Chromosome karyotype b: Gene density c: Repeats coverage d: Transcription state e: GC content f: Subtelomeric repeats distribution

Comparing to Sanger sequenced BACs 97.1% of 181,558 available Sanger-sequenced S. tuberosum ESTs

Comparing to Sanger sequenced BACs

Comparing to BAC/fosmid ends

Anchoring to the chromosomes Anchored 623Mb (86%) to chromosomes With 90.3% of the genes on chromosomes

Repeat annotation and assessment

Repeat content

Gene annotation Genomic sequence Protein sequences Rough alignment ab initio prediction Alignment cdna/est sequences Syteny info. Precise alignment Homologybased genes Post-filtering ab initio genes cdna/est genes TE proteins 12.1% derived solely from ab initio gene predictions RNA-seq reads Genome mapping Gene sets combination Combined gene set Gene sets modification 31.5 Gb of RNA-Seq data from 32 DM and 16 RH samples/tissues 90.2% of 824,621,408 DM reads and 88.6% of 140,375,647 RH reads mapped Final gene set 39,031 protein-coding genes 9,875 genes (25.3%) had alternative splicing

Gene annotation result

Genome evolution gene families Oryza sativa Brachypodium distachyon Sorghum bicolor Zea mays Arabidopsis thaliana Carcia papaya Populus trichocarpa Vitis vinifera Glycine max Monocots Eudicots Chalamydomanas reinhardtii Physcomitrella patens Algae, moss 4,479 potato genes clustered in 3,181 families 34,051 potato genes clustered with at least one genome 2,642 genes are asterid-specific 3,372 gens are potato lineage-specific

Genome evolution - synteny 1,811 syntenic blocks involving 10,046 genes

Genome evolution whole genome duplication ~89 MYA ~67 MYA γ event (~185MYA)

Genome evolution evidence for WGD

Comparing RH and DM 1,644 RH BAC clones 178Mb of non-redundant sequences (~10%) 99Mb of RH sequence (55%) to the DM genome The aligned regions with 97.5% identity SNP every 40 bp and one indel (12.8 bp in average) every 394 bp between RH and DM 6.6Mb of sequence could be aligned with 96.5% identity with in two haplotypes and SNP per 29 bp and 1 indel per 253 bp (average length 10.4 bp)

Comparing at the whole genome level 1,118 million NGS reads (84X) from RH 457.3 million reads aligned to 659.1 Mb (90.6%) of DM genome 3.67 million SNPs Premature stop, frame shift, presence/absence variants

Inbreeding depression 3,018 premature stop codons (606 homozygous and 2,412 heterozygous, 1,760 of which are specific) 80 frameshift mutations (49 homozygous and 31 heterozygous) 275 PAV genes (246 RH specific and 29 were DM specific)

Inbreeding depression One instance of copy number variation Five genes with premature stop codons Seven RH-specific genes

Tuber biology 15,235 genes expressed in the transition from stolons to tubers 15,235 1,217 333 1,217 transcripts with >5-fold expression in stolons versus five RH tuber tissues 333 transcripts upregulated during the transition from stolons to tubers. Particularly, proteinase inhibitors, i.e. KTI (Kunitz protease inhibitor)

KTI family 28 Kunitz protease inhibitor genes (KTIs)

KTI family

Starch synthesis

Flowering time regulation for tuber induction

Disease resistance Many NBS-LRR genes are pseudogenes owing to indels, frame shift mutations, or premature stop codons, including R1, R3a et al., which might be driven by the rapid evolution of effector genes in the potato late blight pathogen, Phytophthora infestans 39.4%

Acknowledgement