Supporting Information Das et al. 10.1073/pnas.1302500110 < SP >< LRRNT > < LRR1 > < LRRV1 > < LRRV2 Pm-VLRC M G F V V A L L V L G A W C G S C S A Q - R Q R A C V E A G K S D V C I C S S A T D S S P E T V D C S S K T L A T V P T G I P A S T E R L E L Q Y N Q L A N I H A K A F H G L T R L T Y L T L E Q N K L Q S L P V G V F D Q L K D L N Lc-VLRC M G F V V A L L V L G A W C G S C S A Q G Q R R A C L A V G K D D I C T C S N K T D S S P E T V D C S S K K L T A V P T G I P A N T E R L E L Q Y N Q L T A V P A N A F K A L T Q L T Y L N L D S N Q L Q S L P V G V F D Q L K N L N Lp-VLRC M G F V V A L L V L G A W C G S C S A Q G R E R A C F A A G K D D L C T C S N K T E S S P E T V D C S S P K L T T V P T G I P A S T E R L E L Q Y N Q L Q T L P A G V F D Q L T E L G T L Y L T T N Q L K S L P P G V F D R L T K L T > < LRRV3 > < LRRV4 > < CP > < LRRCT Pm-VLRC E L H L S I N E L K S L P S G V F D R L T K L K E L W L N S N Q L Q S V P D G V F D K L G S L E R L D L E Q N Q L Q S V P D G A F D S L G K L E L L D L Q N N P W D C E C A S I I Y F V N W L K K N P K H D S G A S C E K P S G T A V Lc-VLRC E L R L S N N Q L K S L P E R V F D S L T R L T Y L N L A Q N Q L Q S I P K G A F D K L T K L E T L H L Q T N K L Q S V P E G A F D N L V D M Q N M Q L H D N P W D C E C A S I I Y F V N W L K E N P K H D S G A S C K K P T G T A V Lp-VLRC L L G L E Q N Q L Q S I P K G V F D R L T N L Q D L R L S T N Q L Q S V P H G A F D R L T N L Q E L R L Y N N Q L Q S V P D G A F D S L T K V E M L Q L H N N P W D C E C A S I I Y F V N W L K E N P K H D S G A S C E K P A G T A V > < C-terminus > Pm-VLRC K D V N T E L I E D V P C K H E I P T P K M T A S P P N T A T S V F T T E L N S T T Y P N A T H E H - - - T D V C N M P F V S H I C L L F C N L F S T C S L C F I I K P L H R Y Lc-VLRC K D V K T K D V K N V P C N H V Y P T S K I T A S S P T P A T S I F I K K L N S T T N L N A I H E H R T H T D V C N M P F V S H M C L L F C N L F S T C S L C F I I K P L H R Y Lp-VLRC K D V K T E P I K N V P C K H V Y P T P K I T A S S P T P A T P I F I P E L N S T T N L N A I H E H R T H T D V C N M P F V T H M C L L F C N L F S T C S L C F I I K P L H R Y Fig. S1. Comparison of mature variable lymphocyte receptor C (VLRC) in sea lamprey (Petromyzon marinus), arctic lamprey (Lethenteron camtschaticum), and European brook lamprey (Lampetra planeri) (GenBank accession nos. KC244058, AB507373, and KC247681, respectively). Query 1st round: 60 matured VLRC 2nd round: Retrieved sequences in 1st round BLASTn search Lamprey genome sequence BLASTn E-value 1e-5 Identity 80% Length 30nu Candidate sequences Exclussion of overlapping sequences Retrival of only non-overlapping largest genomic fragments with 300nu extention at upstream and downstream, respectively Identification of potential boundaries based on the conservation in LRR modules and similarity search using SMART database Hypothetical translation in 3 reading frame Selection of inframe sequences using alignment with mature VLRC VLRC encoding genomic donor cassettes Fig. S2. Flowchart for identification of VLRC-encoding genomic cassettes in the P. marinus genome. 1of9
3' LRRNT-5' LRR1 3' LRR1-5' LRRV T G C A G T c A C A A G A A G C T G G C C A C T G T T C C c A C T G G G A T T C C t g C A A G C A C C G A g A a a C T A c A G C T a C A c t t C A A C C A G C T G g C A A G C A A C c A g C T G a C a a g c a T c c c c G n c A a g G C g T T t c a n g g t C T C a C T c a g c T c A C t t t C C T c g n c c T c a n c a a c A A c a a g c T G c a g T C t a a t c a g c T g c a g a g t n T t c c c g a a g G a g t g T T t g A t a a a C T c a c c a a c c T g a a a a c g c T g n a c C T G c a c a n c A a t c a g c T g c a g a g c 3' LRRV-CP-5'LRRCT a a t a a g t t g c a g A G c G T T C C T g A c G G g g c n T T t G A c A g C C T c g c c a a c c T g g a g a c c a T g a a t C T c c a c a a C A a C C C C T G G g A t t G t Fig. S3. Sequence signatures for frequently used genomic donor cassettes. Presumptive consensus sequences are shown below. Only those genomic cassettes that appeared three or more times in mature VLRCs in the present dataset (60 sequences) were considered. Conserved regions that could potentially be used for the assembly process are indicated by horizontal lines. 2of9
3' LRRV-CP-5'LRRCT 3' LRR1-5' LRRV 3' LRRNT-5' LRR1 Outgroup (LRRCT) Fig. S4. Neighbor-joining phylogenetic tree of VLRC-encoding donor cassettes. The tree is condensed at the 50% bootstrap value level. The single circle and double circles (blue) indicate that the interior branches are supported by >75% and >95% bootstrap values, respectively. Colored symbols in the genomic cassettes correspond to those shown in Fig. 2. The tree was constructed using the pairwise deletion option and the p-distance method. Two C-terminal LRR (LRRCT)-encoding donor cassettes served as an outgroup. 3of9
3' LRR1-5' LRRV 3' LRRV-CP-5'LRRCT 3' LRRNT-5' LRR1 Outgroup (LRRCT) Fig. S5. Maximum likelihood phylogenetic tree (condensed at the 50% bootstrap value level) of VLRC-encoding donor cassettes. The single circle and double circles (blue) indicate that the interior branches are supported by >75% and >95% bootstrap values, respectively. The colored symbols in the genomic cassettes correspond to those shown in Fig. 2. 4of9
Non-repetitious cassette assembly Repetitious cassette assembly Mature VLRC a d c f Mature VLRC a b b d SP LRRNT LRR1 LRRV LRRV LRRV LRRV LRRV CP LRRCT Stalk SP LRRNT LRR1 LRRV LRRV LRRV LRRV LRRV CP LRRCT Stalk a b c d e f g a b c d e f g 5 UTR SP LRRNT LRR1 a 5 UTR SP LRRNT LRR1 a d c f b b d Fig. S6. Nonrepetitious and repetitious donor genomic cassettes used in VLRC assembly. (Left) The 3 LRRV-5 LRRV donor cassettes used in VLRC assembly. This nonrepetitious donor genomic cassette assembly pattern is seen in the majority of mature VLRCs. (Right) The same donor cassette (3 LRRV-5 LRRV) is used repeatedly during VLRC assembly. Repeated use of the same donor cassette can be contiguous (as shown in the cartoon) or noncontiguous. Table S1. Types of genomic VLRC donor cassettes Donor cassette type No. Comments 3 LRRNT-5 LRR1 13 Seven cassettes had high divergence in the base composition at either the 5 region or the 3 region. 3 LRR1-5 LRRV 10 No internal stop codon or high divergence in base composition was found in any region. Two cassettes located in the GL476965 scaffold appear to be recent duplicates. 3 LRRV-5 LRRV 103 Thirty-one cassettes have either an internal stop codon or high divergence in the base composition at the 5 or 3 region. Two partial cassettes were found resulting from the incomplete genome sequence. Multiple potential duplication events were identified. 3 LRRV-CP-5 LRRCT 54 Twelve cassettes had either an internal stop codon or high divergence in the base composition at the 5 or 3 region. Multiple potential duplication events were identified. LRRCT 2 These cassettes were located near the incomplete VLRC gene. The LRRCT was encoded either by these two donor cassettes or by the LRRCT-encoding region of the incomplete VLRC gene. 5of9
Table S2. VLRC-encoding loci and donor cassettes Scaffold Start End Strand Description GL476420 250,127 250,213 Reverse 3 LRRNT-5 LRR1* GL476420 256,576 256,662 Reverse 3 LRRNT-5 LRR1 GL476420 260,888 260,974 Reverse 3 LRRNT-5 LRR1* GL476420 266,986 267,072 Reverse 3 LRRV-5 LRRV GL476420 269,165 269,251 Reverse 3 LRRV-5 LRRV GL476420 272,800 272,886 Reverse 3 LRRV-5 LRRV* GL476420 273,267 273,353 Reverse 3 LRRV-5 LRRV GL476420 331,336 331,422 Reverse 3 LRRV-5 LRRV GL476420 332,339 332,425 Reverse 3 LRRV-5 LRRV GL476420 368,280 368,366 Reverse 3 LRRV-5 LRRV GL476420 638,444 638,572 Reverse LRRCT2* GL476420 663,284 663,370 Reverse 3 LRRNT-5 LRR1 GL476420 666,909 667,037 Reverse LRRCT1* GL476420 675,641 676,992 Reverse VLRC exon 2 GL476420 686,651 686,723 Reverse VLRC exon 1 GL476420 770,352 771,104 Reverse NonLTR/Penelope GL480692 26,146 26,232 Forward 3 LRRNT-5 LRR1 GL480692 29,100 29,186 Forward 3 LRRNT-5 LRR1 GL480692 33,793 33,879 Forward 3 LRRNT-5 LRR1 GL489265 171 257 Reverse 3 LRRNT-5 LRR1 GL489265 3,017 3,106 Forward 3 LRR1-5 LRRV* GL489265 4,806 4,895 Forward 3 LRR1-5 LRRV GL489265 6,062 6,148 Reverse 3 LRRV-5 LRRV* GL489265 7,048 7,137 Forward 3 LRR1-5 LRRV* GL487051 7,696 7,782 Reverse 3 LRRV-5 LRRV GL479755 10,198 10,284 Forward 3 LRRV-5 LRRV GL479755 17,070 17,156 Forward 3 LRRV-5 LRRV* GL479755 27,611 27,697 Forward 3 LRRV-5 LRRV GL484871 1,369 1,455 Reverse 3 LRRV-5 LRRV* GL484871 2,421 2,507 Forward 3 LRRV-5 LRRV GL484871 3,722 3,808 Forward 3 LRRV-5 LRRV* GL484871 4,409 4,495 Forward 3 LRRV-5 LRRV* GL484871 5,147 5,233 Forward 3 LRRV-5 LRRV GL484871 6,347 6,433 Forward 3 LRRV-5 LRRV GL484871 6,935 7,021 Forward 3 LRRV-5 LRRV* GL484871 7,782 7,868 Forward 3 LRRV-5 LRRV GL484871 8,078 8,164 Forward 3 LRRV-CP-5 LRRCT GL484871 9,258 9,344 Forward 3 LRRV-5 LRRV GL484871 9,733 9,819 Forward 3 LRRV-5 LRRV GL484871 10,179 10,265 Forward 3 LRRV-CP-5 LRRCT GL484871 11,359 11,445 Forward 3 LRRV-5 LRRV GL484871 11,827 11,913 Forward 3 LRRV-5 LRRV GL484871 12,131 12,217 Forward 3 LRRV-CP-5 LRRCT GL478588 8,216 8,302 Reverse 3 LRRV-5 LRRV GL478588 8,982 9,068 Reverse 3 LRRV-5 LRRV GL478588 12,762 12,848 Forward 3 LRRV-5 LRRV GL478588 23,466 23,552 Forward 3 LRRV-5 LRRV GL480568 1,276 1,362 Reverse 3 LRRV-5 LRRV* GL480568 2,328 2,414 Forward 3 LRRV-5 LRRV GL480568 3,734 3,820 Forward 3 LRRV-5 LRRV* GL480568 4,421 4,507 Forward 3 LRRV-5 LRRV GL480568 5,159 5,245 Forward 3 LRRV-5 LRRV GL480568 6,359 6,445 Forward 3 LRRV-5 LRRV GL480568 6,947 7,033 Forward 3 LRRV-5 LRRV* GL480568 7,413 7,499 Forward 3 LRRV-5 LRRV GL480568 7,709 7,795 Forward 3 LRRV-CP-5 LRRCT GL480568 8,879 8,965 Forward 3 LRRV-5 LRRV GL480568 9,347 9,433 Forward 3 LRRV-5 LRRV GL480568 9,651 9,737 Forward 3 LRRV-CP-5 LRRCT GL480568 10,842 10,928 Forward 3 LRRV-5 LRRV* GL480568 11,801 11,887 Forward 3 LRRV-5 LRRV GL480568 12,090 12,176 Forward 3 LRRV-CP-5 LRRCT GL480568 17,618 17,704 Forward 3 LRRV-5 LRRV 6of9
Table S2. Cont. Scaffold Start End Strand Description GL480568 18,107 18,195 Forward 3 LRRV-5 LRRV GL480568 19,389 19,475 Forward 3 LRRV-CP-5 LRRCT GL480568 20,016 20,102 Forward 3 LRRV-5 LRRV* GL480568 21,733 21,819 Forward 3 LRRV-5 LRRV GL480568 23,027 23,113 Forward 3 LRRV-CP-5 LRRCT GL480568 23,656 23,742 Forward 3 LRRV-5 LRRV* GL485987 1,966 2,052 Forward 3 LRRV-5 LRRV GL485987 5,404 5,490 Reverse 3 LRRV-5 LRRV GL485987 10,140 10,226 Reverse 3 LRRV-5 LRRV GL485987 11,649 11,735 Reverse 3 LRRV-5 LRRV GL485987 12,915 13,001 Reverse 3 LRRV-5 LRRV GL485987 17,213 17,299 Reverse 3 LRRV-5 LRRV GL485987 18,877 18,963 Reverse 3 LRRV-5 LRRV GL485987 20,442 20,528 Reverse 3 LRRV-5 LRRV GL485987 21,709 21,795 Reverse 3 LRRV-5 LRRV GL492517 3,030 3,116 Forward 3 LRRV-CP-5 LRRCT GL492517 3,745 3,831 Forward 3 LRRV-CP-5 LRRCT* GL476666 719,333 719,419 Forward 3 LRRV-5 LRRV GL476666 720,490 720,576 Reverse 3 LRRV-5 LRRV* GL476666 723,185 723,271 Reverse 3 LRRV-5 LRRV GL476666 723,560 723,646 Forward 3 LRRV-5 LRRV GL481936 5,683 5,769 Forward 3 LRRV-5 LRRV* GL481936 8,002 8,088 Forward 3 LRRV-5 LRRV GL481936 10,280 10,366 Reverse 3 LRRV-5 LRRV GL481936 12,673 12,759 Reverse 3 LRRV-5 LRRV GL481936 14,817 14,903 Reverse 3 LRRV-5 LRRV GL481936 20,006 20,092 Reverse 3 LRRV-5 LRRV GL478984 23,664 23,750 Forward 3 LRRV-5 LRRV GL478984 25,026 25,112 Forward 3 LRRV-5 LRRV* GL478984 25,670 25,756 Forward 3 LRRV-5 LRRV GL478984 171,474 171,560 Forward 3 LRRV-CP-5 LRRCT* GL478984 172,567 172,653 Reverse 3 LRRV-5 LRRV* GL478984 173,427 173,513 Forward 3 LRRV-5 LRRV* GL478984 176,594 176,680 Forward 3 LRR1-5 LRRV GL478984 178,654 178,740 Forward 3 LRR1-5 LRRV GL478984 180,146 180,232 Reverse 3 LRRV-5 LRRV GL478984 181,242 181,328 Forward 3 LRRV-5 LRRV* GL478984 182,163 182,249 Forward 3 LRRV-5 LRRV GL478984 186,280 186,366 Forward 3 LRRV-5 LRRV GL476965 351 437 Reverse 3 LRR1-5 LRRV* GL476965 4,866 4,952 Reverse 3 LRR1-5 LRRV GL476965 19,152 19,238 Forward 3 LRR1-5 LRRV GL476965 19,773 19,859 Reverse 3 LRR1-5 LRRV GL476965 119,012 119,098 Forward 3 LRRV-CP-5 LRRCT GL476965 119,300 119,386 Reverse 3 LRRV-CP-5 LRRCT GL476965 120,299 120,385 Forward 3 LRRV-CP-5 LRRCT GL476965 120,843 120,929 Forward 3 LRRV-CP-5 LRRCT GL476965 121,837 121,923 Forward 3 LRRV-CP-5 LRRCT GL476965 122,155 122,241 Reverse 3 LRRV-CP-5 LRRCT GL476965 127,369 127,455 Forward 3 LRRV-CP-5 LRRCT GL476965 127,787 127,873 Forward 3 LRRV-CP-5 LRRCT GL476965 128,779 128,865 Forward 3 LRRV-CP-5 LRRCT GL476965 129,097 129,183 Reverse 3 LRRV-CP-5 LRRCT GL476965 130,103 130,189 Forward 3 LRRV-CP-5 LRRCT GL476965 130,609 130,695 Forward 3 LRRV-CP-5 LRRCT GL476965 131,861 131,947 Forward 3 LRRV-CP-5 LRRCT GL476965 132,174 132,260 Reverse 3 LRRV-CP-5 LRRCT GL476965 135,369 135,455 Forward 3 LRRV-CP-5 LRRCT GL476965 135,972 136,058 Forward 3 LRRV-CP-5 LRRCT GL476965 137,737 137,823 Forward 3 LRRV-CP-5 LRRCT* GL476965 138,055 138,141 Reverse 3 LRRV-CP-5 LRRCT GL476965 139,051 139,137 Forward 3 LRRV-CP-5 LRRCT* 7of9
Table S2. Cont. Scaffold Start End Strand Description GL476965 140,464 140,550 Forward 3 LRRV-CP-5 LRRCT GL476965 141,019 141,105 Reverse 3 LRRV-CP-5 LRRCT GL476965 146,302 146,388 Reverse 3 LRRV-CP-5 LRRCT GL476965 153,774 153,863 Forward 3 LRRV-5 LRRV GL476965 154,172 154,258 Forward 3 LRRV-5 LRRV GL476965 160,010 160,096 Reverse 3 LRRV-CP-5 LRRCT GL476965 160,349 160,435 Forward 3 LRRV-5 LRRV GL476965 160,989 161,075 Forward 3 LRRV-5 LRRV GL476965 162,883 162,969 Forward 3 LRRV-5 LRRV GL476965 163,656 163,742 Reverse 3 LRRV-5 LRRV GL476965 267,984 268,070 Reverse 3 LRRV-5 LRRV GL476965 268,638 268,724 Reverse 3 LRRV-5 LRRV* GL476965 271,953 272,039 Forward 3 LRRV-5 LRRV GL476965 273,996 274,082 Forward 3 LRRV-5 LRRV GL476965 276,880 276,966 Reverse 3 LRRV-5 LRRV GL476965 282,091 282,177 Reverse 3 LRRV-5 LRRV GL476965 283,821 283,907 Forward 3 LRRV-5 LRRV GL476965 285,362 285,448 Reverse 3 LRRV-5 LRRV GL476332 20,641 20,727 Reverse 3 LRRV-5 LRRV GL487538 2,213 2,299 Forward 3 LRRV-5 LRRV GL480812 397 483 Forward 3 LRRV-CP-5 LRRCT GL480812 1,841 1,927 Forward 3 LRRV-CP-5 LRRCT* GL480812 2,699 2,785 Reverse 3 LRRV-CP-5 LRRCT GL480812 3,209 3,295 Forward 3 LRRV-CP-5 LRRCT GL480812 3,603 3,689 Forward 3 LRRV-CP-5 LRRCT* GL480812 6,896 6,982 Forward 3 LRRV-CP-5 LRRCT GL480812 7,961 8,047 Reverse 3 LRRV-CP-5 LRRCT* GL480812 8,559 8,645 Forward 3 LRRV-CP-5 LRRCT GL480812 11,366 11,452 Forward 3 LRRV-5 LRRV GL480812 12,118 12,204 Reverse 3 LRRV-5 LRRV* GL480812 13,229 13,315 Forward 3 LRRV-5 LRRV GL480812 17,774 17,860 Forward 3 LRRV-CP-5 LRRCT GL480812 18,281 18,367 Reverse 3 LRRV-5 LRRV* GL480812 19,285 19,371 Forward 3 LRRV-CP-5 LRRCT* GL480812 19,755 19,841 Reverse 3 LRRV-5 LRRV GL480812 21,307 21,393 Forward 3 LRRV-CP-5 LRRCT GL480812 23,246 23,332 Reverse 3 LRRV-5 LRRV GL480812 24,129 24,215 Reverse 3 LRR1-5 LRRV GL480881 20,474 20,560 Reverse 3 LRRV-CP-5 LRRCT GL480881 20,774 20,860 Reverse 3 LRRV-CP-5 LRRCT GL480881 22,194 22,280 Reverse 3 LRRV-CP-5 LRRCT GL480881 27,480 27,566 Reverse 3 LRRV-CP-5 LRRCT GL480881 29,259 29,345 Reverse 3 LRRV-CP-5 LRRCT GL480881 32,548 32,634 Reverse 3 LRRV-CP-5 LRRCT GL480881 38,777 38,863 Reverse 3 LRRV-CP-5 LRRCT GL480881 41,401 41,487 Reverse 3 LRRV-CP-5 LRRCT GL481730 8,315 8,398 Reverse 3 LRRV-CP-5 LRRCT GL481730 12,826 12,912 Reverse 3 LRRNT-5 LRR1 GL481730 14,110 14,196 Reverse 3 LRRNT-5 LRR1 GL481730 15,703 15,789 Reverse 3 LRRNT-5 LRR1 GL481730 18,312 18,398 Reverse 3 LRRNT-5 LRR1 GL492231 911 991 Forward 3 LRRNT-5 LRR1 GL483826 1 73 Reverse 3 LRRV-5 LRRV GL483826 7,426 7,512 Forward 3 LRRV-5 LRRV* GL476399 3,251,235 3,251,321 Forward 3 LRRV-5 LRRV GL476399 3,254,583 3,254,669 Forward 3 LRRV-5 LRRV GL477382 173,841 173,927 Forward 3 LRRV-5 LRRV GL477382 17,925 18,011 Forward 3 LRRV-5 LRRV GL487899 12,056 12,142 Forward 3 LRRV-5 LRRV GL487899 12,919 13,005 Forward 3 LRRV-5 LRRV* *Genomic donor cassettes appearing at least three times in the dataset of 60 mature VLRCs. Genomic donor cassettes appearing seven or more times in the dataset of 60 mature VLRCs. 8of9
Table S3. Characterization of partial VLRC assemblies of L. planeri Clone Type of assembly Description GenBank accession no. VLRC#8_TT_6 3 assembly Insertion of LRRCT module A* KC247673 VLRC#8_TT_16 3 assembly Insertion of LRRCT module B* KC247674 VLRC#8_TT_45 3 assembly Insertions of LRRCT module A* and CP module X KC247675 VLRC#8_TT_36 3 assembly Insertions of LRRCT module A*, CP module Y, and LRRV module KC247676 VLRC#8_TT_13 5 assembly Insertion of incomplete LRR1 module KC247677 VLRC#8_TT_108 5 assembly Insertion of complete LRR1 module KC247678 VLRC#8_TT_44 5 assembly Insertions of LRR1 and LRRV modules KC247679 The sequence of the VLRC gene has been deposited in GenBank (accession no. KC247680). *Both genomic C-terminal LRR (LRRCT) modules encode a sequence that is 2 aa residues longer than that of the germ-line sequence (similar to the situation in P. marinus). Table S4. Primers used in this study Primer name Primer sequence (5 3 ) Species Location Use VLRC-5UTR_F AGTGTTGGGTCCCGTGCG P. marinus 5 -UTR Primary amplification VLRC-3UTR_R ACGGGGATGTCTCTACTTTA P. marinus 3 -UTR Primary amplification VLRC5.1 CTGAAACTGTTGACTGCAGTAGC L. planeri LRRNT Primary amplification VLRC5.2 GACTGGGATTCCTGCAAACACCGAG L. planeri LRR1 Heminested amplification VLRC_3 CAAAAGGCATGTTACACACATCCGTG L. planeri C terminus Primary amplification VLRC_5U GCCGAGCCGCGATGGGGTTTGTCGTG L. planeri 5 -UTR; signal peptide Primary amplification VLRC_3U CATATTTTTGTCGCCATGCAACG L. planeri 3 -UTR Primary amplification 9of9