Potato Genome Analysis Xin Liu Deputy director BGI research 2016.1.21 WCRTC 2016 @ Nanning
Reference genome construction???????????????????????????????????????? Sequencing HELL RIEND WELCOME BGI ZHEN LLOFRI DSWEL METOBG HENZH HELLOF SWEL METO GISHEN ELLOFR DSW COM OBGI ENZHEN OFIIEN WELCOM GISH NZHEN Assemble HELLO FRIENDS WELCOME TO BGISHENZHEN
Second generation sequencing for assembly Construct libraries with hierarchical insert-sizes; 250bp, 500bp, 800bp, 2kb, 5kb, 10kb, 20kb, 40kb Sequence the libraries; 60X genome coverage; De novo assembly Annotation and evolutionary analysis
Genome survey 1. 30X data 2. K-mer analysis 3. Preliminary assembly 4. Heterozygosity simulation analysis 5. GC depth distribution analysis 1.Genome size 2.Heterozygosity rate 3.GC content 4.Repeat sequence proportion
The potato genome Would provide important resource for crop improvement
Information of potato genome Autotetraploid (2n=4x=48) Highly heterozygous Heterozygous diploid available Double haplotype available Different dataset available Genome size: 850 Mb
Sample selection DM1-3 516 R44 (DM) resulted from chromosome doubling of a monoploid (1n=1x=12) derived by anther culture of a heterozygous diploid (2n=2x=24) S. tuberosum group Phureja clone (PI 225669).
Heterozygosity affecting genome assembly Heterozygosity would result in breakdown of the assembly. Rei Kajitani, Kouta Toshimoto, Hideki Noguchi, et al.
Assess the genome 33,761,617,031 bases Peak at 40 Genome size estimated to be: 844 Mb S. tuberosum group Phureja DM1-3 516 R44
The heterozygous diploid S. tuberosum group Tuberosum RH89-039-16
Assessing the heterozygosity
Assemble the DM genome: data
The potato genome assembly 727 Mb, 6.1% Ns/gaps, 86% of the genome N90 349 kb, 443 super scaffolds a: Chromosome karyotype b: Gene density c: Repeats coverage d: Transcription state e: GC content f: Subtelomeric repeats distribution
Comparing to Sanger sequenced BACs 97.1% of 181,558 available Sanger-sequenced S. tuberosum ESTs
Comparing to Sanger sequenced BACs
Comparing to BAC/fosmid ends
Anchoring to the chromosomes Anchored 623Mb (86%) to chromosomes With 90.3% of the genes on chromosomes
Repeat annotation and assessment
Repeat content
Gene annotation Genomic sequence Protein sequences Rough alignment ab initio prediction Alignment cdna/est sequences Syteny info. Precise alignment Homologybased genes Post-filtering ab initio genes cdna/est genes TE proteins 12.1% derived solely from ab initio gene predictions RNA-seq reads Genome mapping Gene sets combination Combined gene set Gene sets modification 31.5 Gb of RNA-Seq data from 32 DM and 16 RH samples/tissues 90.2% of 824,621,408 DM reads and 88.6% of 140,375,647 RH reads mapped Final gene set 39,031 protein-coding genes 9,875 genes (25.3%) had alternative splicing
Gene annotation result
Genome evolution gene families Oryza sativa Brachypodium distachyon Sorghum bicolor Zea mays Arabidopsis thaliana Carcia papaya Populus trichocarpa Vitis vinifera Glycine max Monocots Eudicots Chalamydomanas reinhardtii Physcomitrella patens Algae, moss 4,479 potato genes clustered in 3,181 families 34,051 potato genes clustered with at least one genome 2,642 genes are asterid-specific 3,372 gens are potato lineage-specific
Genome evolution - synteny 1,811 syntenic blocks involving 10,046 genes
Genome evolution whole genome duplication ~89 MYA ~67 MYA γ event (~185MYA)
Genome evolution evidence for WGD
Comparing RH and DM 1,644 RH BAC clones 178Mb of non-redundant sequences (~10%) 99Mb of RH sequence (55%) to the DM genome The aligned regions with 97.5% identity SNP every 40 bp and one indel (12.8 bp in average) every 394 bp between RH and DM 6.6Mb of sequence could be aligned with 96.5% identity with in two haplotypes and SNP per 29 bp and 1 indel per 253 bp (average length 10.4 bp)
Comparing at the whole genome level 1,118 million NGS reads (84X) from RH 457.3 million reads aligned to 659.1 Mb (90.6%) of DM genome 3.67 million SNPs Premature stop, frame shift, presence/absence variants
Inbreeding depression 3,018 premature stop codons (606 homozygous and 2,412 heterozygous, 1,760 of which are specific) 80 frameshift mutations (49 homozygous and 31 heterozygous) 275 PAV genes (246 RH specific and 29 were DM specific)
Inbreeding depression One instance of copy number variation Five genes with premature stop codons Seven RH-specific genes
Tuber biology 15,235 genes expressed in the transition from stolons to tubers 15,235 1,217 333 1,217 transcripts with >5-fold expression in stolons versus five RH tuber tissues 333 transcripts upregulated during the transition from stolons to tubers. Particularly, proteinase inhibitors, i.e. KTI (Kunitz protease inhibitor)
KTI family 28 Kunitz protease inhibitor genes (KTIs)
KTI family
Starch synthesis
Flowering time regulation for tuber induction
Disease resistance Many NBS-LRR genes are pseudogenes owing to indels, frame shift mutations, or premature stop codons, including R1, R3a et al., which might be driven by the rapid evolution of effector genes in the potato late blight pathogen, Phytophthora infestans 39.4%
Acknowledgement