Supporting Information

Similar documents
Sequence analysis and comparison

Supplementary Information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

BLAST. Varieties of BLAST

SUPPLEMENTARY INFORMATION

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Introduction to Hidden Markov Models (HMMs)

Phylogenetic analyses. Kirsi Kostamo

2 Genome evolution: gene fusion versus gene fission

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Basic Local Alignment Search Tool

HMMs and biological sequence analysis

Eukaryotic vs. Prokaryotic genes

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Computational Genomics and Molecular Biology, Fall

Illegitimate translation causes unexpected gene expression from on-target out-of-frame alleles

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

O 3 O 4 O 5. q 3. q 4. Transition

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

A (short) introduction to phylogenetics

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis

From Gene to Protein

Using Bioinformatics to Study Evolutionary Relationships Instructions

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Comparative Bioinformatics Midterm II Fall 2004

Genomic insights into the taxonomic status of the Bacillus cereus group. Laboratory of Marine Genetic Resources, Third Institute of Oceanography,

Comparing whole genomes

Similarity or Identity? When are molecules similar?

Assembly improvement: based on Ragout approach. student: Anna Lioznova scientific advisor: Son Pham

Genomics and bioinformatics summary. Finding genes -- computer searches

EVOLUTIONARY DISTANCES

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Figure A1. Phylogenetic trees based on concatenated sequences of eight MLST loci. Phylogenetic trees were constructed based on concatenated sequences


Supporting Information

Gre C G G A T T A T T C A T A T A A T T G T T A T A C C A G A C G G T C G C

SUPPLEMENTARY INFORMATION

CRITICA: Coding Region Identification Tool Invoking Comparative Analysis

Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes

The Phylogenetic Reconstruction of the Grass Family (Poaceae) Using matk Gene Sequences

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites

1. In most cases, genes code for and it is that

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

A Browser for Pig Genome Data

Overview of IslandPick pipeline and the generation of GI datasets

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

Sequence analysis and Genomics

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Annotation of Plant Genomes using RNA-seq. Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA)

Bioinformatics Exercises

Evolutionary Tree Analysis. Overview

Dr. Amira A. AL-Hosary

Figure S1: Phylogenetic tree of Pseudomonas and related bacteria. Phylogenetic trees were generated using parsimony, neighbor-joining and maximum

Molecular evolution. Joe Felsenstein. GENOME 453, Autumn Molecular evolution p.1/49

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Detailed overview of the primer-free full-length SSU rrna library preparation.

a,bD (modules 1 and 10 are required)

Study and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons.

aP. Short title: Mulberry badnavirus 1, a new species in the Badnavirus genus (e.g. 6 new species in the genus Zetavirus) Modules attached

GEP Annotation Report

Phylogenetic trees 07/10/13

Videos. Bozeman, transcription and translation: Crashcourse: Transcription and Translation -

Tandem Mass Spectrometry: Generating function, alignment and assembly

Package vhica. April 5, 2016

Additional file 10. Classification of Pac sequences based on maximum-likelihood (ML) phylogenetic analyses. Analyses were performed on the same

Sequencing alignment Ameer Effat M. Elfarash

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Organic Chemistry Option II: Chemical Biology

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Hands-On Nine The PAX6 Gene and Protein

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Multiple Sequence Alignment

Phylogenetic Tree Generation using Different Scoring Methods

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Title: A novel mechanism of protein thermostability: a unique N-terminal domain confers

Synteny Portal Documentation

chapter 5 the mammalian cell entry 1 (mce1) operon of Mycobacterium Ieprae and Mycobacterium tuberculosis

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Supporting Online Material for

Today s Lecture: HMMs

Phylogenetic inference

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

Bioinformatics and BLAST

Multiple Whole Genome Alignment

Motivating the need for optimal sequence alignments...

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Transcription:

Supporting Information Das et al. 10.1073/pnas.1302500110 < SP >< LRRNT > < LRR1 > < LRRV1 > < LRRV2 Pm-VLRC M G F V V A L L V L G A W C G S C S A Q - R Q R A C V E A G K S D V C I C S S A T D S S P E T V D C S S K T L A T V P T G I P A S T E R L E L Q Y N Q L A N I H A K A F H G L T R L T Y L T L E Q N K L Q S L P V G V F D Q L K D L N Lc-VLRC M G F V V A L L V L G A W C G S C S A Q G Q R R A C L A V G K D D I C T C S N K T D S S P E T V D C S S K K L T A V P T G I P A N T E R L E L Q Y N Q L T A V P A N A F K A L T Q L T Y L N L D S N Q L Q S L P V G V F D Q L K N L N Lp-VLRC M G F V V A L L V L G A W C G S C S A Q G R E R A C F A A G K D D L C T C S N K T E S S P E T V D C S S P K L T T V P T G I P A S T E R L E L Q Y N Q L Q T L P A G V F D Q L T E L G T L Y L T T N Q L K S L P P G V F D R L T K L T > < LRRV3 > < LRRV4 > < CP > < LRRCT Pm-VLRC E L H L S I N E L K S L P S G V F D R L T K L K E L W L N S N Q L Q S V P D G V F D K L G S L E R L D L E Q N Q L Q S V P D G A F D S L G K L E L L D L Q N N P W D C E C A S I I Y F V N W L K K N P K H D S G A S C E K P S G T A V Lc-VLRC E L R L S N N Q L K S L P E R V F D S L T R L T Y L N L A Q N Q L Q S I P K G A F D K L T K L E T L H L Q T N K L Q S V P E G A F D N L V D M Q N M Q L H D N P W D C E C A S I I Y F V N W L K E N P K H D S G A S C K K P T G T A V Lp-VLRC L L G L E Q N Q L Q S I P K G V F D R L T N L Q D L R L S T N Q L Q S V P H G A F D R L T N L Q E L R L Y N N Q L Q S V P D G A F D S L T K V E M L Q L H N N P W D C E C A S I I Y F V N W L K E N P K H D S G A S C E K P A G T A V > < C-terminus > Pm-VLRC K D V N T E L I E D V P C K H E I P T P K M T A S P P N T A T S V F T T E L N S T T Y P N A T H E H - - - T D V C N M P F V S H I C L L F C N L F S T C S L C F I I K P L H R Y Lc-VLRC K D V K T K D V K N V P C N H V Y P T S K I T A S S P T P A T S I F I K K L N S T T N L N A I H E H R T H T D V C N M P F V S H M C L L F C N L F S T C S L C F I I K P L H R Y Lp-VLRC K D V K T E P I K N V P C K H V Y P T P K I T A S S P T P A T P I F I P E L N S T T N L N A I H E H R T H T D V C N M P F V T H M C L L F C N L F S T C S L C F I I K P L H R Y Fig. S1. Comparison of mature variable lymphocyte receptor C (VLRC) in sea lamprey (Petromyzon marinus), arctic lamprey (Lethenteron camtschaticum), and European brook lamprey (Lampetra planeri) (GenBank accession nos. KC244058, AB507373, and KC247681, respectively). Query 1st round: 60 matured VLRC 2nd round: Retrieved sequences in 1st round BLASTn search Lamprey genome sequence BLASTn E-value 1e-5 Identity 80% Length 30nu Candidate sequences Exclussion of overlapping sequences Retrival of only non-overlapping largest genomic fragments with 300nu extention at upstream and downstream, respectively Identification of potential boundaries based on the conservation in LRR modules and similarity search using SMART database Hypothetical translation in 3 reading frame Selection of inframe sequences using alignment with mature VLRC VLRC encoding genomic donor cassettes Fig. S2. Flowchart for identification of VLRC-encoding genomic cassettes in the P. marinus genome. 1of9

3' LRRNT-5' LRR1 3' LRR1-5' LRRV T G C A G T c A C A A G A A G C T G G C C A C T G T T C C c A C T G G G A T T C C t g C A A G C A C C G A g A a a C T A c A G C T a C A c t t C A A C C A G C T G g C A A G C A A C c A g C T G a C a a g c a T c c c c G n c A a g G C g T T t c a n g g t C T C a C T c a g c T c A C t t t C C T c g n c c T c a n c a a c A A c a a g c T G c a g T C t a a t c a g c T g c a g a g t n T t c c c g a a g G a g t g T T t g A t a a a C T c a c c a a c c T g a a a a c g c T g n a c C T G c a c a n c A a t c a g c T g c a g a g c 3' LRRV-CP-5'LRRCT a a t a a g t t g c a g A G c G T T C C T g A c G G g g c n T T t G A c A g C C T c g c c a a c c T g g a g a c c a T g a a t C T c c a c a a C A a C C C C T G G g A t t G t Fig. S3. Sequence signatures for frequently used genomic donor cassettes. Presumptive consensus sequences are shown below. Only those genomic cassettes that appeared three or more times in mature VLRCs in the present dataset (60 sequences) were considered. Conserved regions that could potentially be used for the assembly process are indicated by horizontal lines. 2of9

3' LRRV-CP-5'LRRCT 3' LRR1-5' LRRV 3' LRRNT-5' LRR1 Outgroup (LRRCT) Fig. S4. Neighbor-joining phylogenetic tree of VLRC-encoding donor cassettes. The tree is condensed at the 50% bootstrap value level. The single circle and double circles (blue) indicate that the interior branches are supported by >75% and >95% bootstrap values, respectively. Colored symbols in the genomic cassettes correspond to those shown in Fig. 2. The tree was constructed using the pairwise deletion option and the p-distance method. Two C-terminal LRR (LRRCT)-encoding donor cassettes served as an outgroup. 3of9

3' LRR1-5' LRRV 3' LRRV-CP-5'LRRCT 3' LRRNT-5' LRR1 Outgroup (LRRCT) Fig. S5. Maximum likelihood phylogenetic tree (condensed at the 50% bootstrap value level) of VLRC-encoding donor cassettes. The single circle and double circles (blue) indicate that the interior branches are supported by >75% and >95% bootstrap values, respectively. The colored symbols in the genomic cassettes correspond to those shown in Fig. 2. 4of9

Non-repetitious cassette assembly Repetitious cassette assembly Mature VLRC a d c f Mature VLRC a b b d SP LRRNT LRR1 LRRV LRRV LRRV LRRV LRRV CP LRRCT Stalk SP LRRNT LRR1 LRRV LRRV LRRV LRRV LRRV CP LRRCT Stalk a b c d e f g a b c d e f g 5 UTR SP LRRNT LRR1 a 5 UTR SP LRRNT LRR1 a d c f b b d Fig. S6. Nonrepetitious and repetitious donor genomic cassettes used in VLRC assembly. (Left) The 3 LRRV-5 LRRV donor cassettes used in VLRC assembly. This nonrepetitious donor genomic cassette assembly pattern is seen in the majority of mature VLRCs. (Right) The same donor cassette (3 LRRV-5 LRRV) is used repeatedly during VLRC assembly. Repeated use of the same donor cassette can be contiguous (as shown in the cartoon) or noncontiguous. Table S1. Types of genomic VLRC donor cassettes Donor cassette type No. Comments 3 LRRNT-5 LRR1 13 Seven cassettes had high divergence in the base composition at either the 5 region or the 3 region. 3 LRR1-5 LRRV 10 No internal stop codon or high divergence in base composition was found in any region. Two cassettes located in the GL476965 scaffold appear to be recent duplicates. 3 LRRV-5 LRRV 103 Thirty-one cassettes have either an internal stop codon or high divergence in the base composition at the 5 or 3 region. Two partial cassettes were found resulting from the incomplete genome sequence. Multiple potential duplication events were identified. 3 LRRV-CP-5 LRRCT 54 Twelve cassettes had either an internal stop codon or high divergence in the base composition at the 5 or 3 region. Multiple potential duplication events were identified. LRRCT 2 These cassettes were located near the incomplete VLRC gene. The LRRCT was encoded either by these two donor cassettes or by the LRRCT-encoding region of the incomplete VLRC gene. 5of9

Table S2. VLRC-encoding loci and donor cassettes Scaffold Start End Strand Description GL476420 250,127 250,213 Reverse 3 LRRNT-5 LRR1* GL476420 256,576 256,662 Reverse 3 LRRNT-5 LRR1 GL476420 260,888 260,974 Reverse 3 LRRNT-5 LRR1* GL476420 266,986 267,072 Reverse 3 LRRV-5 LRRV GL476420 269,165 269,251 Reverse 3 LRRV-5 LRRV GL476420 272,800 272,886 Reverse 3 LRRV-5 LRRV* GL476420 273,267 273,353 Reverse 3 LRRV-5 LRRV GL476420 331,336 331,422 Reverse 3 LRRV-5 LRRV GL476420 332,339 332,425 Reverse 3 LRRV-5 LRRV GL476420 368,280 368,366 Reverse 3 LRRV-5 LRRV GL476420 638,444 638,572 Reverse LRRCT2* GL476420 663,284 663,370 Reverse 3 LRRNT-5 LRR1 GL476420 666,909 667,037 Reverse LRRCT1* GL476420 675,641 676,992 Reverse VLRC exon 2 GL476420 686,651 686,723 Reverse VLRC exon 1 GL476420 770,352 771,104 Reverse NonLTR/Penelope GL480692 26,146 26,232 Forward 3 LRRNT-5 LRR1 GL480692 29,100 29,186 Forward 3 LRRNT-5 LRR1 GL480692 33,793 33,879 Forward 3 LRRNT-5 LRR1 GL489265 171 257 Reverse 3 LRRNT-5 LRR1 GL489265 3,017 3,106 Forward 3 LRR1-5 LRRV* GL489265 4,806 4,895 Forward 3 LRR1-5 LRRV GL489265 6,062 6,148 Reverse 3 LRRV-5 LRRV* GL489265 7,048 7,137 Forward 3 LRR1-5 LRRV* GL487051 7,696 7,782 Reverse 3 LRRV-5 LRRV GL479755 10,198 10,284 Forward 3 LRRV-5 LRRV GL479755 17,070 17,156 Forward 3 LRRV-5 LRRV* GL479755 27,611 27,697 Forward 3 LRRV-5 LRRV GL484871 1,369 1,455 Reverse 3 LRRV-5 LRRV* GL484871 2,421 2,507 Forward 3 LRRV-5 LRRV GL484871 3,722 3,808 Forward 3 LRRV-5 LRRV* GL484871 4,409 4,495 Forward 3 LRRV-5 LRRV* GL484871 5,147 5,233 Forward 3 LRRV-5 LRRV GL484871 6,347 6,433 Forward 3 LRRV-5 LRRV GL484871 6,935 7,021 Forward 3 LRRV-5 LRRV* GL484871 7,782 7,868 Forward 3 LRRV-5 LRRV GL484871 8,078 8,164 Forward 3 LRRV-CP-5 LRRCT GL484871 9,258 9,344 Forward 3 LRRV-5 LRRV GL484871 9,733 9,819 Forward 3 LRRV-5 LRRV GL484871 10,179 10,265 Forward 3 LRRV-CP-5 LRRCT GL484871 11,359 11,445 Forward 3 LRRV-5 LRRV GL484871 11,827 11,913 Forward 3 LRRV-5 LRRV GL484871 12,131 12,217 Forward 3 LRRV-CP-5 LRRCT GL478588 8,216 8,302 Reverse 3 LRRV-5 LRRV GL478588 8,982 9,068 Reverse 3 LRRV-5 LRRV GL478588 12,762 12,848 Forward 3 LRRV-5 LRRV GL478588 23,466 23,552 Forward 3 LRRV-5 LRRV GL480568 1,276 1,362 Reverse 3 LRRV-5 LRRV* GL480568 2,328 2,414 Forward 3 LRRV-5 LRRV GL480568 3,734 3,820 Forward 3 LRRV-5 LRRV* GL480568 4,421 4,507 Forward 3 LRRV-5 LRRV GL480568 5,159 5,245 Forward 3 LRRV-5 LRRV GL480568 6,359 6,445 Forward 3 LRRV-5 LRRV GL480568 6,947 7,033 Forward 3 LRRV-5 LRRV* GL480568 7,413 7,499 Forward 3 LRRV-5 LRRV GL480568 7,709 7,795 Forward 3 LRRV-CP-5 LRRCT GL480568 8,879 8,965 Forward 3 LRRV-5 LRRV GL480568 9,347 9,433 Forward 3 LRRV-5 LRRV GL480568 9,651 9,737 Forward 3 LRRV-CP-5 LRRCT GL480568 10,842 10,928 Forward 3 LRRV-5 LRRV* GL480568 11,801 11,887 Forward 3 LRRV-5 LRRV GL480568 12,090 12,176 Forward 3 LRRV-CP-5 LRRCT GL480568 17,618 17,704 Forward 3 LRRV-5 LRRV 6of9

Table S2. Cont. Scaffold Start End Strand Description GL480568 18,107 18,195 Forward 3 LRRV-5 LRRV GL480568 19,389 19,475 Forward 3 LRRV-CP-5 LRRCT GL480568 20,016 20,102 Forward 3 LRRV-5 LRRV* GL480568 21,733 21,819 Forward 3 LRRV-5 LRRV GL480568 23,027 23,113 Forward 3 LRRV-CP-5 LRRCT GL480568 23,656 23,742 Forward 3 LRRV-5 LRRV* GL485987 1,966 2,052 Forward 3 LRRV-5 LRRV GL485987 5,404 5,490 Reverse 3 LRRV-5 LRRV GL485987 10,140 10,226 Reverse 3 LRRV-5 LRRV GL485987 11,649 11,735 Reverse 3 LRRV-5 LRRV GL485987 12,915 13,001 Reverse 3 LRRV-5 LRRV GL485987 17,213 17,299 Reverse 3 LRRV-5 LRRV GL485987 18,877 18,963 Reverse 3 LRRV-5 LRRV GL485987 20,442 20,528 Reverse 3 LRRV-5 LRRV GL485987 21,709 21,795 Reverse 3 LRRV-5 LRRV GL492517 3,030 3,116 Forward 3 LRRV-CP-5 LRRCT GL492517 3,745 3,831 Forward 3 LRRV-CP-5 LRRCT* GL476666 719,333 719,419 Forward 3 LRRV-5 LRRV GL476666 720,490 720,576 Reverse 3 LRRV-5 LRRV* GL476666 723,185 723,271 Reverse 3 LRRV-5 LRRV GL476666 723,560 723,646 Forward 3 LRRV-5 LRRV GL481936 5,683 5,769 Forward 3 LRRV-5 LRRV* GL481936 8,002 8,088 Forward 3 LRRV-5 LRRV GL481936 10,280 10,366 Reverse 3 LRRV-5 LRRV GL481936 12,673 12,759 Reverse 3 LRRV-5 LRRV GL481936 14,817 14,903 Reverse 3 LRRV-5 LRRV GL481936 20,006 20,092 Reverse 3 LRRV-5 LRRV GL478984 23,664 23,750 Forward 3 LRRV-5 LRRV GL478984 25,026 25,112 Forward 3 LRRV-5 LRRV* GL478984 25,670 25,756 Forward 3 LRRV-5 LRRV GL478984 171,474 171,560 Forward 3 LRRV-CP-5 LRRCT* GL478984 172,567 172,653 Reverse 3 LRRV-5 LRRV* GL478984 173,427 173,513 Forward 3 LRRV-5 LRRV* GL478984 176,594 176,680 Forward 3 LRR1-5 LRRV GL478984 178,654 178,740 Forward 3 LRR1-5 LRRV GL478984 180,146 180,232 Reverse 3 LRRV-5 LRRV GL478984 181,242 181,328 Forward 3 LRRV-5 LRRV* GL478984 182,163 182,249 Forward 3 LRRV-5 LRRV GL478984 186,280 186,366 Forward 3 LRRV-5 LRRV GL476965 351 437 Reverse 3 LRR1-5 LRRV* GL476965 4,866 4,952 Reverse 3 LRR1-5 LRRV GL476965 19,152 19,238 Forward 3 LRR1-5 LRRV GL476965 19,773 19,859 Reverse 3 LRR1-5 LRRV GL476965 119,012 119,098 Forward 3 LRRV-CP-5 LRRCT GL476965 119,300 119,386 Reverse 3 LRRV-CP-5 LRRCT GL476965 120,299 120,385 Forward 3 LRRV-CP-5 LRRCT GL476965 120,843 120,929 Forward 3 LRRV-CP-5 LRRCT GL476965 121,837 121,923 Forward 3 LRRV-CP-5 LRRCT GL476965 122,155 122,241 Reverse 3 LRRV-CP-5 LRRCT GL476965 127,369 127,455 Forward 3 LRRV-CP-5 LRRCT GL476965 127,787 127,873 Forward 3 LRRV-CP-5 LRRCT GL476965 128,779 128,865 Forward 3 LRRV-CP-5 LRRCT GL476965 129,097 129,183 Reverse 3 LRRV-CP-5 LRRCT GL476965 130,103 130,189 Forward 3 LRRV-CP-5 LRRCT GL476965 130,609 130,695 Forward 3 LRRV-CP-5 LRRCT GL476965 131,861 131,947 Forward 3 LRRV-CP-5 LRRCT GL476965 132,174 132,260 Reverse 3 LRRV-CP-5 LRRCT GL476965 135,369 135,455 Forward 3 LRRV-CP-5 LRRCT GL476965 135,972 136,058 Forward 3 LRRV-CP-5 LRRCT GL476965 137,737 137,823 Forward 3 LRRV-CP-5 LRRCT* GL476965 138,055 138,141 Reverse 3 LRRV-CP-5 LRRCT GL476965 139,051 139,137 Forward 3 LRRV-CP-5 LRRCT* 7of9

Table S2. Cont. Scaffold Start End Strand Description GL476965 140,464 140,550 Forward 3 LRRV-CP-5 LRRCT GL476965 141,019 141,105 Reverse 3 LRRV-CP-5 LRRCT GL476965 146,302 146,388 Reverse 3 LRRV-CP-5 LRRCT GL476965 153,774 153,863 Forward 3 LRRV-5 LRRV GL476965 154,172 154,258 Forward 3 LRRV-5 LRRV GL476965 160,010 160,096 Reverse 3 LRRV-CP-5 LRRCT GL476965 160,349 160,435 Forward 3 LRRV-5 LRRV GL476965 160,989 161,075 Forward 3 LRRV-5 LRRV GL476965 162,883 162,969 Forward 3 LRRV-5 LRRV GL476965 163,656 163,742 Reverse 3 LRRV-5 LRRV GL476965 267,984 268,070 Reverse 3 LRRV-5 LRRV GL476965 268,638 268,724 Reverse 3 LRRV-5 LRRV* GL476965 271,953 272,039 Forward 3 LRRV-5 LRRV GL476965 273,996 274,082 Forward 3 LRRV-5 LRRV GL476965 276,880 276,966 Reverse 3 LRRV-5 LRRV GL476965 282,091 282,177 Reverse 3 LRRV-5 LRRV GL476965 283,821 283,907 Forward 3 LRRV-5 LRRV GL476965 285,362 285,448 Reverse 3 LRRV-5 LRRV GL476332 20,641 20,727 Reverse 3 LRRV-5 LRRV GL487538 2,213 2,299 Forward 3 LRRV-5 LRRV GL480812 397 483 Forward 3 LRRV-CP-5 LRRCT GL480812 1,841 1,927 Forward 3 LRRV-CP-5 LRRCT* GL480812 2,699 2,785 Reverse 3 LRRV-CP-5 LRRCT GL480812 3,209 3,295 Forward 3 LRRV-CP-5 LRRCT GL480812 3,603 3,689 Forward 3 LRRV-CP-5 LRRCT* GL480812 6,896 6,982 Forward 3 LRRV-CP-5 LRRCT GL480812 7,961 8,047 Reverse 3 LRRV-CP-5 LRRCT* GL480812 8,559 8,645 Forward 3 LRRV-CP-5 LRRCT GL480812 11,366 11,452 Forward 3 LRRV-5 LRRV GL480812 12,118 12,204 Reverse 3 LRRV-5 LRRV* GL480812 13,229 13,315 Forward 3 LRRV-5 LRRV GL480812 17,774 17,860 Forward 3 LRRV-CP-5 LRRCT GL480812 18,281 18,367 Reverse 3 LRRV-5 LRRV* GL480812 19,285 19,371 Forward 3 LRRV-CP-5 LRRCT* GL480812 19,755 19,841 Reverse 3 LRRV-5 LRRV GL480812 21,307 21,393 Forward 3 LRRV-CP-5 LRRCT GL480812 23,246 23,332 Reverse 3 LRRV-5 LRRV GL480812 24,129 24,215 Reverse 3 LRR1-5 LRRV GL480881 20,474 20,560 Reverse 3 LRRV-CP-5 LRRCT GL480881 20,774 20,860 Reverse 3 LRRV-CP-5 LRRCT GL480881 22,194 22,280 Reverse 3 LRRV-CP-5 LRRCT GL480881 27,480 27,566 Reverse 3 LRRV-CP-5 LRRCT GL480881 29,259 29,345 Reverse 3 LRRV-CP-5 LRRCT GL480881 32,548 32,634 Reverse 3 LRRV-CP-5 LRRCT GL480881 38,777 38,863 Reverse 3 LRRV-CP-5 LRRCT GL480881 41,401 41,487 Reverse 3 LRRV-CP-5 LRRCT GL481730 8,315 8,398 Reverse 3 LRRV-CP-5 LRRCT GL481730 12,826 12,912 Reverse 3 LRRNT-5 LRR1 GL481730 14,110 14,196 Reverse 3 LRRNT-5 LRR1 GL481730 15,703 15,789 Reverse 3 LRRNT-5 LRR1 GL481730 18,312 18,398 Reverse 3 LRRNT-5 LRR1 GL492231 911 991 Forward 3 LRRNT-5 LRR1 GL483826 1 73 Reverse 3 LRRV-5 LRRV GL483826 7,426 7,512 Forward 3 LRRV-5 LRRV* GL476399 3,251,235 3,251,321 Forward 3 LRRV-5 LRRV GL476399 3,254,583 3,254,669 Forward 3 LRRV-5 LRRV GL477382 173,841 173,927 Forward 3 LRRV-5 LRRV GL477382 17,925 18,011 Forward 3 LRRV-5 LRRV GL487899 12,056 12,142 Forward 3 LRRV-5 LRRV GL487899 12,919 13,005 Forward 3 LRRV-5 LRRV* *Genomic donor cassettes appearing at least three times in the dataset of 60 mature VLRCs. Genomic donor cassettes appearing seven or more times in the dataset of 60 mature VLRCs. 8of9

Table S3. Characterization of partial VLRC assemblies of L. planeri Clone Type of assembly Description GenBank accession no. VLRC#8_TT_6 3 assembly Insertion of LRRCT module A* KC247673 VLRC#8_TT_16 3 assembly Insertion of LRRCT module B* KC247674 VLRC#8_TT_45 3 assembly Insertions of LRRCT module A* and CP module X KC247675 VLRC#8_TT_36 3 assembly Insertions of LRRCT module A*, CP module Y, and LRRV module KC247676 VLRC#8_TT_13 5 assembly Insertion of incomplete LRR1 module KC247677 VLRC#8_TT_108 5 assembly Insertion of complete LRR1 module KC247678 VLRC#8_TT_44 5 assembly Insertions of LRR1 and LRRV modules KC247679 The sequence of the VLRC gene has been deposited in GenBank (accession no. KC247680). *Both genomic C-terminal LRR (LRRCT) modules encode a sequence that is 2 aa residues longer than that of the germ-line sequence (similar to the situation in P. marinus). Table S4. Primers used in this study Primer name Primer sequence (5 3 ) Species Location Use VLRC-5UTR_F AGTGTTGGGTCCCGTGCG P. marinus 5 -UTR Primary amplification VLRC-3UTR_R ACGGGGATGTCTCTACTTTA P. marinus 3 -UTR Primary amplification VLRC5.1 CTGAAACTGTTGACTGCAGTAGC L. planeri LRRNT Primary amplification VLRC5.2 GACTGGGATTCCTGCAAACACCGAG L. planeri LRR1 Heminested amplification VLRC_3 CAAAAGGCATGTTACACACATCCGTG L. planeri C terminus Primary amplification VLRC_5U GCCGAGCCGCGATGGGGTTTGTCGTG L. planeri 5 -UTR; signal peptide Primary amplification VLRC_3U CATATTTTTGTCGCCATGCAACG L. planeri 3 -UTR Primary amplification 9of9