Supporting Information

Size: px
Start display at page:

Download "Supporting Information"

Transcription

1 Supporting Information Das et al /pnas < SP >< LRRNT > < LRR1 > < LRRV1 > < LRRV2 Pm-VLRC M G F V V A L L V L G A W C G S C S A Q - R Q R A C V E A G K S D V C I C S S A T D S S P E T V D C S S K T L A T V P T G I P A S T E R L E L Q Y N Q L A N I H A K A F H G L T R L T Y L T L E Q N K L Q S L P V G V F D Q L K D L N Lc-VLRC M G F V V A L L V L G A W C G S C S A Q G Q R R A C L A V G K D D I C T C S N K T D S S P E T V D C S S K K L T A V P T G I P A N T E R L E L Q Y N Q L T A V P A N A F K A L T Q L T Y L N L D S N Q L Q S L P V G V F D Q L K N L N Lp-VLRC M G F V V A L L V L G A W C G S C S A Q G R E R A C F A A G K D D L C T C S N K T E S S P E T V D C S S P K L T T V P T G I P A S T E R L E L Q Y N Q L Q T L P A G V F D Q L T E L G T L Y L T T N Q L K S L P P G V F D R L T K L T > < LRRV3 > < LRRV4 > < CP > < LRRCT Pm-VLRC E L H L S I N E L K S L P S G V F D R L T K L K E L W L N S N Q L Q S V P D G V F D K L G S L E R L D L E Q N Q L Q S V P D G A F D S L G K L E L L D L Q N N P W D C E C A S I I Y F V N W L K K N P K H D S G A S C E K P S G T A V Lc-VLRC E L R L S N N Q L K S L P E R V F D S L T R L T Y L N L A Q N Q L Q S I P K G A F D K L T K L E T L H L Q T N K L Q S V P E G A F D N L V D M Q N M Q L H D N P W D C E C A S I I Y F V N W L K E N P K H D S G A S C K K P T G T A V Lp-VLRC L L G L E Q N Q L Q S I P K G V F D R L T N L Q D L R L S T N Q L Q S V P H G A F D R L T N L Q E L R L Y N N Q L Q S V P D G A F D S L T K V E M L Q L H N N P W D C E C A S I I Y F V N W L K E N P K H D S G A S C E K P A G T A V > < C-terminus > Pm-VLRC K D V N T E L I E D V P C K H E I P T P K M T A S P P N T A T S V F T T E L N S T T Y P N A T H E H T D V C N M P F V S H I C L L F C N L F S T C S L C F I I K P L H R Y Lc-VLRC K D V K T K D V K N V P C N H V Y P T S K I T A S S P T P A T S I F I K K L N S T T N L N A I H E H R T H T D V C N M P F V S H M C L L F C N L F S T C S L C F I I K P L H R Y Lp-VLRC K D V K T E P I K N V P C K H V Y P T P K I T A S S P T P A T P I F I P E L N S T T N L N A I H E H R T H T D V C N M P F V T H M C L L F C N L F S T C S L C F I I K P L H R Y Fig. S1. Comparison of mature variable lymphocyte receptor C (VLRC) in sea lamprey (Petromyzon marinus), arctic lamprey (Lethenteron camtschaticum), and European brook lamprey (Lampetra planeri) (GenBank accession nos. KC244058, AB507373, and KC247681, respectively). Query 1st round: 60 matured VLRC 2nd round: Retrieved sequences in 1st round BLASTn search Lamprey genome sequence BLASTn E-value 1e-5 Identity 80% Length 30nu Candidate sequences Exclussion of overlapping sequences Retrival of only non-overlapping largest genomic fragments with 300nu extention at upstream and downstream, respectively Identification of potential boundaries based on the conservation in LRR modules and similarity search using SMART database Hypothetical translation in 3 reading frame Selection of inframe sequences using alignment with mature VLRC VLRC encoding genomic donor cassettes Fig. S2. Flowchart for identification of VLRC-encoding genomic cassettes in the P. marinus genome. 1of9

2 3' LRRNT-5' LRR1 3' LRR1-5' LRRV T G C A G T c A C A A G A A G C T G G C C A C T G T T C C c A C T G G G A T T C C t g C A A G C A C C G A g A a a C T A c A G C T a C A c t t C A A C C A G C T G g C A A G C A A C c A g C T G a C a a g c a T c c c c G n c A a g G C g T T t c a n g g t C T C a C T c a g c T c A C t t t C C T c g n c c T c a n c a a c A A c a a g c T G c a g T C t a a t c a g c T g c a g a g t n T t c c c g a a g G a g t g T T t g A t a a a C T c a c c a a c c T g a a a a c g c T g n a c C T G c a c a n c A a t c a g c T g c a g a g c 3' LRRV-CP-5'LRRCT a a t a a g t t g c a g A G c G T T C C T g A c G G g g c n T T t G A c A g C C T c g c c a a c c T g g a g a c c a T g a a t C T c c a c a a C A a C C C C T G G g A t t G t Fig. S3. Sequence signatures for frequently used genomic donor cassettes. Presumptive consensus sequences are shown below. Only those genomic cassettes that appeared three or more times in mature VLRCs in the present dataset (60 sequences) were considered. Conserved regions that could potentially be used for the assembly process are indicated by horizontal lines. 2of9

3 3' LRRV-CP-5'LRRCT 3' LRR1-5' LRRV 3' LRRNT-5' LRR1 Outgroup (LRRCT) Fig. S4. Neighbor-joining phylogenetic tree of VLRC-encoding donor cassettes. The tree is condensed at the 50% bootstrap value level. The single circle and double circles (blue) indicate that the interior branches are supported by >75% and >95% bootstrap values, respectively. Colored symbols in the genomic cassettes correspond to those shown in Fig. 2. The tree was constructed using the pairwise deletion option and the p-distance method. Two C-terminal LRR (LRRCT)-encoding donor cassettes served as an outgroup. 3of9

4 3' LRR1-5' LRRV 3' LRRV-CP-5'LRRCT 3' LRRNT-5' LRR1 Outgroup (LRRCT) Fig. S5. Maximum likelihood phylogenetic tree (condensed at the 50% bootstrap value level) of VLRC-encoding donor cassettes. The single circle and double circles (blue) indicate that the interior branches are supported by >75% and >95% bootstrap values, respectively. The colored symbols in the genomic cassettes correspond to those shown in Fig. 2. 4of9

5 Non-repetitious cassette assembly Repetitious cassette assembly Mature VLRC a d c f Mature VLRC a b b d SP LRRNT LRR1 LRRV LRRV LRRV LRRV LRRV CP LRRCT Stalk SP LRRNT LRR1 LRRV LRRV LRRV LRRV LRRV CP LRRCT Stalk a b c d e f g a b c d e f g 5 UTR SP LRRNT LRR1 a 5 UTR SP LRRNT LRR1 a d c f b b d Fig. S6. Nonrepetitious and repetitious donor genomic cassettes used in VLRC assembly. (Left) The 3 LRRV-5 LRRV donor cassettes used in VLRC assembly. This nonrepetitious donor genomic cassette assembly pattern is seen in the majority of mature VLRCs. (Right) The same donor cassette (3 LRRV-5 LRRV) is used repeatedly during VLRC assembly. Repeated use of the same donor cassette can be contiguous (as shown in the cartoon) or noncontiguous. Table S1. Types of genomic VLRC donor cassettes Donor cassette type No. Comments 3 LRRNT-5 LRR1 13 Seven cassettes had high divergence in the base composition at either the 5 region or the 3 region. 3 LRR1-5 LRRV 10 No internal stop codon or high divergence in base composition was found in any region. Two cassettes located in the GL scaffold appear to be recent duplicates. 3 LRRV-5 LRRV 103 Thirty-one cassettes have either an internal stop codon or high divergence in the base composition at the 5 or 3 region. Two partial cassettes were found resulting from the incomplete genome sequence. Multiple potential duplication events were identified. 3 LRRV-CP-5 LRRCT 54 Twelve cassettes had either an internal stop codon or high divergence in the base composition at the 5 or 3 region. Multiple potential duplication events were identified. LRRCT 2 These cassettes were located near the incomplete VLRC gene. The LRRCT was encoded either by these two donor cassettes or by the LRRCT-encoding region of the incomplete VLRC gene. 5of9

6 Table S2. VLRC-encoding loci and donor cassettes Scaffold Start End Strand Description GL , ,213 Reverse 3 LRRNT-5 LRR1* GL , ,662 Reverse 3 LRRNT-5 LRR1 GL , ,974 Reverse 3 LRRNT-5 LRR1* GL , ,072 Reverse 3 LRRV-5 LRRV GL , ,251 Reverse 3 LRRV-5 LRRV GL , ,886 Reverse 3 LRRV-5 LRRV* GL , ,353 Reverse 3 LRRV-5 LRRV GL , ,422 Reverse 3 LRRV-5 LRRV GL , ,425 Reverse 3 LRRV-5 LRRV GL , ,366 Reverse 3 LRRV-5 LRRV GL , ,572 Reverse LRRCT2* GL , ,370 Reverse 3 LRRNT-5 LRR1 GL , ,037 Reverse LRRCT1* GL , ,992 Reverse VLRC exon 2 GL , ,723 Reverse VLRC exon 1 GL , ,104 Reverse NonLTR/Penelope GL ,146 26,232 Forward 3 LRRNT-5 LRR1 GL ,100 29,186 Forward 3 LRRNT-5 LRR1 GL ,793 33,879 Forward 3 LRRNT-5 LRR1 GL Reverse 3 LRRNT-5 LRR1 GL ,017 3,106 Forward 3 LRR1-5 LRRV* GL ,806 4,895 Forward 3 LRR1-5 LRRV GL ,062 6,148 Reverse 3 LRRV-5 LRRV* GL ,048 7,137 Forward 3 LRR1-5 LRRV* GL ,696 7,782 Reverse 3 LRRV-5 LRRV GL ,198 10,284 Forward 3 LRRV-5 LRRV GL ,070 17,156 Forward 3 LRRV-5 LRRV* GL ,611 27,697 Forward 3 LRRV-5 LRRV GL ,369 1,455 Reverse 3 LRRV-5 LRRV* GL ,421 2,507 Forward 3 LRRV-5 LRRV GL ,722 3,808 Forward 3 LRRV-5 LRRV* GL ,409 4,495 Forward 3 LRRV-5 LRRV* GL ,147 5,233 Forward 3 LRRV-5 LRRV GL ,347 6,433 Forward 3 LRRV-5 LRRV GL ,935 7,021 Forward 3 LRRV-5 LRRV* GL ,782 7,868 Forward 3 LRRV-5 LRRV GL ,078 8,164 Forward 3 LRRV-CP-5 LRRCT GL ,258 9,344 Forward 3 LRRV-5 LRRV GL ,733 9,819 Forward 3 LRRV-5 LRRV GL ,179 10,265 Forward 3 LRRV-CP-5 LRRCT GL ,359 11,445 Forward 3 LRRV-5 LRRV GL ,827 11,913 Forward 3 LRRV-5 LRRV GL ,131 12,217 Forward 3 LRRV-CP-5 LRRCT GL ,216 8,302 Reverse 3 LRRV-5 LRRV GL ,982 9,068 Reverse 3 LRRV-5 LRRV GL ,762 12,848 Forward 3 LRRV-5 LRRV GL ,466 23,552 Forward 3 LRRV-5 LRRV GL ,276 1,362 Reverse 3 LRRV-5 LRRV* GL ,328 2,414 Forward 3 LRRV-5 LRRV GL ,734 3,820 Forward 3 LRRV-5 LRRV* GL ,421 4,507 Forward 3 LRRV-5 LRRV GL ,159 5,245 Forward 3 LRRV-5 LRRV GL ,359 6,445 Forward 3 LRRV-5 LRRV GL ,947 7,033 Forward 3 LRRV-5 LRRV* GL ,413 7,499 Forward 3 LRRV-5 LRRV GL ,709 7,795 Forward 3 LRRV-CP-5 LRRCT GL ,879 8,965 Forward 3 LRRV-5 LRRV GL ,347 9,433 Forward 3 LRRV-5 LRRV GL ,651 9,737 Forward 3 LRRV-CP-5 LRRCT GL ,842 10,928 Forward 3 LRRV-5 LRRV* GL ,801 11,887 Forward 3 LRRV-5 LRRV GL ,090 12,176 Forward 3 LRRV-CP-5 LRRCT GL ,618 17,704 Forward 3 LRRV-5 LRRV 6of9

7 Table S2. Cont. Scaffold Start End Strand Description GL ,107 18,195 Forward 3 LRRV-5 LRRV GL ,389 19,475 Forward 3 LRRV-CP-5 LRRCT GL ,016 20,102 Forward 3 LRRV-5 LRRV* GL ,733 21,819 Forward 3 LRRV-5 LRRV GL ,027 23,113 Forward 3 LRRV-CP-5 LRRCT GL ,656 23,742 Forward 3 LRRV-5 LRRV* GL ,966 2,052 Forward 3 LRRV-5 LRRV GL ,404 5,490 Reverse 3 LRRV-5 LRRV GL ,140 10,226 Reverse 3 LRRV-5 LRRV GL ,649 11,735 Reverse 3 LRRV-5 LRRV GL ,915 13,001 Reverse 3 LRRV-5 LRRV GL ,213 17,299 Reverse 3 LRRV-5 LRRV GL ,877 18,963 Reverse 3 LRRV-5 LRRV GL ,442 20,528 Reverse 3 LRRV-5 LRRV GL ,709 21,795 Reverse 3 LRRV-5 LRRV GL ,030 3,116 Forward 3 LRRV-CP-5 LRRCT GL ,745 3,831 Forward 3 LRRV-CP-5 LRRCT* GL , ,419 Forward 3 LRRV-5 LRRV GL , ,576 Reverse 3 LRRV-5 LRRV* GL , ,271 Reverse 3 LRRV-5 LRRV GL , ,646 Forward 3 LRRV-5 LRRV GL ,683 5,769 Forward 3 LRRV-5 LRRV* GL ,002 8,088 Forward 3 LRRV-5 LRRV GL ,280 10,366 Reverse 3 LRRV-5 LRRV GL ,673 12,759 Reverse 3 LRRV-5 LRRV GL ,817 14,903 Reverse 3 LRRV-5 LRRV GL ,006 20,092 Reverse 3 LRRV-5 LRRV GL ,664 23,750 Forward 3 LRRV-5 LRRV GL ,026 25,112 Forward 3 LRRV-5 LRRV* GL ,670 25,756 Forward 3 LRRV-5 LRRV GL , ,560 Forward 3 LRRV-CP-5 LRRCT* GL , ,653 Reverse 3 LRRV-5 LRRV* GL , ,513 Forward 3 LRRV-5 LRRV* GL , ,680 Forward 3 LRR1-5 LRRV GL , ,740 Forward 3 LRR1-5 LRRV GL , ,232 Reverse 3 LRRV-5 LRRV GL , ,328 Forward 3 LRRV-5 LRRV* GL , ,249 Forward 3 LRRV-5 LRRV GL , ,366 Forward 3 LRRV-5 LRRV GL Reverse 3 LRR1-5 LRRV* GL ,866 4,952 Reverse 3 LRR1-5 LRRV GL ,152 19,238 Forward 3 LRR1-5 LRRV GL ,773 19,859 Reverse 3 LRR1-5 LRRV GL , ,098 Forward 3 LRRV-CP-5 LRRCT GL , ,386 Reverse 3 LRRV-CP-5 LRRCT GL , ,385 Forward 3 LRRV-CP-5 LRRCT GL , ,929 Forward 3 LRRV-CP-5 LRRCT GL , ,923 Forward 3 LRRV-CP-5 LRRCT GL , ,241 Reverse 3 LRRV-CP-5 LRRCT GL , ,455 Forward 3 LRRV-CP-5 LRRCT GL , ,873 Forward 3 LRRV-CP-5 LRRCT GL , ,865 Forward 3 LRRV-CP-5 LRRCT GL , ,183 Reverse 3 LRRV-CP-5 LRRCT GL , ,189 Forward 3 LRRV-CP-5 LRRCT GL , ,695 Forward 3 LRRV-CP-5 LRRCT GL , ,947 Forward 3 LRRV-CP-5 LRRCT GL , ,260 Reverse 3 LRRV-CP-5 LRRCT GL , ,455 Forward 3 LRRV-CP-5 LRRCT GL , ,058 Forward 3 LRRV-CP-5 LRRCT GL , ,823 Forward 3 LRRV-CP-5 LRRCT* GL , ,141 Reverse 3 LRRV-CP-5 LRRCT GL , ,137 Forward 3 LRRV-CP-5 LRRCT* 7of9

8 Table S2. Cont. Scaffold Start End Strand Description GL , ,550 Forward 3 LRRV-CP-5 LRRCT GL , ,105 Reverse 3 LRRV-CP-5 LRRCT GL , ,388 Reverse 3 LRRV-CP-5 LRRCT GL , ,863 Forward 3 LRRV-5 LRRV GL , ,258 Forward 3 LRRV-5 LRRV GL , ,096 Reverse 3 LRRV-CP-5 LRRCT GL , ,435 Forward 3 LRRV-5 LRRV GL , ,075 Forward 3 LRRV-5 LRRV GL , ,969 Forward 3 LRRV-5 LRRV GL , ,742 Reverse 3 LRRV-5 LRRV GL , ,070 Reverse 3 LRRV-5 LRRV GL , ,724 Reverse 3 LRRV-5 LRRV* GL , ,039 Forward 3 LRRV-5 LRRV GL , ,082 Forward 3 LRRV-5 LRRV GL , ,966 Reverse 3 LRRV-5 LRRV GL , ,177 Reverse 3 LRRV-5 LRRV GL , ,907 Forward 3 LRRV-5 LRRV GL , ,448 Reverse 3 LRRV-5 LRRV GL ,641 20,727 Reverse 3 LRRV-5 LRRV GL ,213 2,299 Forward 3 LRRV-5 LRRV GL Forward 3 LRRV-CP-5 LRRCT GL ,841 1,927 Forward 3 LRRV-CP-5 LRRCT* GL ,699 2,785 Reverse 3 LRRV-CP-5 LRRCT GL ,209 3,295 Forward 3 LRRV-CP-5 LRRCT GL ,603 3,689 Forward 3 LRRV-CP-5 LRRCT* GL ,896 6,982 Forward 3 LRRV-CP-5 LRRCT GL ,961 8,047 Reverse 3 LRRV-CP-5 LRRCT* GL ,559 8,645 Forward 3 LRRV-CP-5 LRRCT GL ,366 11,452 Forward 3 LRRV-5 LRRV GL ,118 12,204 Reverse 3 LRRV-5 LRRV* GL ,229 13,315 Forward 3 LRRV-5 LRRV GL ,774 17,860 Forward 3 LRRV-CP-5 LRRCT GL ,281 18,367 Reverse 3 LRRV-5 LRRV* GL ,285 19,371 Forward 3 LRRV-CP-5 LRRCT* GL ,755 19,841 Reverse 3 LRRV-5 LRRV GL ,307 21,393 Forward 3 LRRV-CP-5 LRRCT GL ,246 23,332 Reverse 3 LRRV-5 LRRV GL ,129 24,215 Reverse 3 LRR1-5 LRRV GL ,474 20,560 Reverse 3 LRRV-CP-5 LRRCT GL ,774 20,860 Reverse 3 LRRV-CP-5 LRRCT GL ,194 22,280 Reverse 3 LRRV-CP-5 LRRCT GL ,480 27,566 Reverse 3 LRRV-CP-5 LRRCT GL ,259 29,345 Reverse 3 LRRV-CP-5 LRRCT GL ,548 32,634 Reverse 3 LRRV-CP-5 LRRCT GL ,777 38,863 Reverse 3 LRRV-CP-5 LRRCT GL ,401 41,487 Reverse 3 LRRV-CP-5 LRRCT GL ,315 8,398 Reverse 3 LRRV-CP-5 LRRCT GL ,826 12,912 Reverse 3 LRRNT-5 LRR1 GL ,110 14,196 Reverse 3 LRRNT-5 LRR1 GL ,703 15,789 Reverse 3 LRRNT-5 LRR1 GL ,312 18,398 Reverse 3 LRRNT-5 LRR1 GL Forward 3 LRRNT-5 LRR1 GL Reverse 3 LRRV-5 LRRV GL ,426 7,512 Forward 3 LRRV-5 LRRV* GL ,251,235 3,251,321 Forward 3 LRRV-5 LRRV GL ,254,583 3,254,669 Forward 3 LRRV-5 LRRV GL , ,927 Forward 3 LRRV-5 LRRV GL ,925 18,011 Forward 3 LRRV-5 LRRV GL ,056 12,142 Forward 3 LRRV-5 LRRV GL ,919 13,005 Forward 3 LRRV-5 LRRV* *Genomic donor cassettes appearing at least three times in the dataset of 60 mature VLRCs. Genomic donor cassettes appearing seven or more times in the dataset of 60 mature VLRCs. 8of9

9 Table S3. Characterization of partial VLRC assemblies of L. planeri Clone Type of assembly Description GenBank accession no. VLRC#8_TT_6 3 assembly Insertion of LRRCT module A* KC VLRC#8_TT_16 3 assembly Insertion of LRRCT module B* KC VLRC#8_TT_45 3 assembly Insertions of LRRCT module A* and CP module X KC VLRC#8_TT_36 3 assembly Insertions of LRRCT module A*, CP module Y, and LRRV module KC VLRC#8_TT_13 5 assembly Insertion of incomplete LRR1 module KC VLRC#8_TT_108 5 assembly Insertion of complete LRR1 module KC VLRC#8_TT_44 5 assembly Insertions of LRR1 and LRRV modules KC The sequence of the VLRC gene has been deposited in GenBank (accession no. KC247680). *Both genomic C-terminal LRR (LRRCT) modules encode a sequence that is 2 aa residues longer than that of the germ-line sequence (similar to the situation in P. marinus). Table S4. Primers used in this study Primer name Primer sequence (5 3 ) Species Location Use VLRC-5UTR_F AGTGTTGGGTCCCGTGCG P. marinus 5 -UTR Primary amplification VLRC-3UTR_R ACGGGGATGTCTCTACTTTA P. marinus 3 -UTR Primary amplification VLRC5.1 CTGAAACTGTTGACTGCAGTAGC L. planeri LRRNT Primary amplification VLRC5.2 GACTGGGATTCCTGCAAACACCGAG L. planeri LRR1 Heminested amplification VLRC_3 CAAAAGGCATGTTACACACATCCGTG L. planeri C terminus Primary amplification VLRC_5U GCCGAGCCGCGATGGGGTTTGTCGTG L. planeri 5 -UTR; signal peptide Primary amplification VLRC_3U CATATTTTTGTCGCCATGCAACG L. planeri 3 -UTR Primary amplification 9of9

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Supplementary Information

Supplementary Information Supplementary Information Supplementary Figure 1. Schematic pipeline for single-cell genome assembly, cleaning and annotation. a. The assembly process was optimized to account for multiple cells putatively

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

Introduction to Hidden Markov Models (HMMs)

Introduction to Hidden Markov Models (HMMs) Introduction to Hidden Markov Models (HMMs) But first, some probability and statistics background Important Topics 1.! Random Variables and Probability 2.! Probability Distributions 3.! Parameter Estimation

More information

Phylogenetic analyses. Kirsi Kostamo

Phylogenetic analyses. Kirsi Kostamo Phylogenetic analyses Kirsi Kostamo The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species,

More information

2 Genome evolution: gene fusion versus gene fission

2 Genome evolution: gene fusion versus gene fission 2 Genome evolution: gene fusion versus gene fission Berend Snel, Peer Bork and Martijn A. Huynen Trends in Genetics 16 (2000) 9-11 13 Chapter 2 Introduction With the advent of complete genome sequencing,

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

HMMs and biological sequence analysis

HMMs and biological sequence analysis HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

More information

Eukaryotic vs. Prokaryotic genes

Eukaryotic vs. Prokaryotic genes BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 18: Eukaryotic genes http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Eukaryotic vs. Prokaryotic genes Like in prokaryotes,

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related

More information

Illegitimate translation causes unexpected gene expression from on-target out-of-frame alleles

Illegitimate translation causes unexpected gene expression from on-target out-of-frame alleles Illegitimate translation causes unexpected gene expression from on-target out-of-frame alleles created by CRISPR-Cas9 Shigeru Makino, Ryutaro Fukumura, Yoichi Gondo* Mutagenesis and Genomics Team, RIKEN

More information

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007 -2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven) BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis he universe of biological sequence analysis Word/pattern recognition- Identification of restriction enzyme cleavage sites Sequence alignment methods PstI he universe of biological sequence analysis - prediction

More information

From Gene to Protein

From Gene to Protein From Gene to Protein Gene Expression Process by which DNA directs the synthesis of a protein 2 stages transcription translation All organisms One gene one protein 1. Transcription of DNA Gene Composed

More information

Using Bioinformatics to Study Evolutionary Relationships Instructions

Using Bioinformatics to Study Evolutionary Relationships Instructions 3 Using Bioinformatics to Study Evolutionary Relationships Instructions Student Researcher Background: Making and Using Multiple Sequence Alignments One of the primary tasks of genetic researchers is comparing

More information

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and counterclockwise for the inner row, with green representing coding

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

Comparative Bioinformatics Midterm II Fall 2004

Comparative Bioinformatics Midterm II Fall 2004 Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans

More information

Genomic insights into the taxonomic status of the Bacillus cereus group. Laboratory of Marine Genetic Resources, Third Institute of Oceanography,

Genomic insights into the taxonomic status of the Bacillus cereus group. Laboratory of Marine Genetic Resources, Third Institute of Oceanography, 1 2 3 Genomic insights into the taxonomic status of the Bacillus cereus group Yang Liu 1, Qiliang Lai 1, Markus Göker 2, Jan P. Meier-Kolthoff 2, Meng Wang 3, Yamin Sun 3, Lei Wang 3 and Zongze Shao 1*

More information

Comparing whole genomes

Comparing whole genomes BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will

More information

Similarity or Identity? When are molecules similar?

Similarity or Identity? When are molecules similar? Similarity or Identity? When are molecules similar? Mapping Identity A -> A T -> T G -> G C -> C or Leu -> Leu Pro -> Pro Arg -> Arg Phe -> Phe etc If we map similarity using identity, how similar are

More information

Assembly improvement: based on Ragout approach. student: Anna Lioznova scientific advisor: Son Pham

Assembly improvement: based on Ragout approach. student: Anna Lioznova scientific advisor: Son Pham Assembly improvement: based on Ragout approach student: Anna Lioznova scientific advisor: Son Pham Plan Ragout overview Datasets Assembly improvements Quality overlap graph paired-end reads Coverage Plan

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D 7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood

More information

Figure A1. Phylogenetic trees based on concatenated sequences of eight MLST loci. Phylogenetic trees were constructed based on concatenated sequences

Figure A1. Phylogenetic trees based on concatenated sequences of eight MLST loci. Phylogenetic trees were constructed based on concatenated sequences A. B. Figure A1. Phylogenetic trees based on concatenated sequences of eight MLST loci. Phylogenetic trees were constructed based on concatenated sequences of eight housekeeping loci for 12 unique STs

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Supporting Information

Supporting Information Supporting Information Ziemert et al. 10.1073/pnas.1324161111 Fig. S1. Geographic origin and numbers of Salinispora strains used in this study. Fig. S2. Operational biosynthetic unit (OBU) phylogeny supports

More information

Gre C G G A T T A T T C A T A T A A T T G T T A T A C C A G A C G G T C G C

Gre C G G A T T A T T C A T A T A A T T G T T A T A C C A G A C G G T C G C 2.1.1 PAH cdna Sequence Compiled by S.Byck, P.M.Nowacki, and Max Caccione. This Reference Sequence (simplified) is deposited in GenBank: U49897. This sequence is derived from the following: Partial 5'UTR

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

CRITICA: Coding Region Identification Tool Invoking Comparative Analysis

CRITICA: Coding Region Identification Tool Invoking Comparative Analysis CRITICA: Coding Region Identification Tool Invoking Comparative Analysis Jonathan H. Badger and Gary J. Olsen Department of Microbiology, University of Illinois Gene recognition is essential to understanding

More information

Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes

Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes David DeCaprio, Ying Li, Hung Nguyen (sequenced Ascomycetes genomes courtesy of the Broad Institute) Phylogenomics Combining whole genome

More information

The Phylogenetic Reconstruction of the Grass Family (Poaceae) Using matk Gene Sequences

The Phylogenetic Reconstruction of the Grass Family (Poaceae) Using matk Gene Sequences The Phylogenetic Reconstruction of the Grass Family (Poaceae) Using matk Gene Sequences by Hongping Liang Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites

Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites Paper by: James P. Balhoff and Gregory A. Wray Presentation by: Stephanie Lucas Reviewed

More information

1. In most cases, genes code for and it is that

1. In most cases, genes code for and it is that Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions

More information

A Browser for Pig Genome Data

A Browser for Pig Genome Data A Browser for Pig Genome Data Thomas Mailund January 2, 2004 This report briefly describe the blast and alignment data available at http://www.daimi.au.dk/ mailund/pig-genome/ hits.html. The report describes

More information

Overview of IslandPick pipeline and the generation of GI datasets

Overview of IslandPick pipeline and the generation of GI datasets Overview of IslandPick pipeline and the generation of GI datasets Predicting GIs using comparative genomics By using whole genome alignments we can identify regions that are present in one genome but not

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family Jieming Shen 1,2 and Hugh B. Nicholas, Jr. 3 1 Bioengineering and Bioinformatics Summer

More information

Sequence analysis and Genomics

Sequence analysis and Genomics Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

Annotation of Plant Genomes using RNA-seq. Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA)

Annotation of Plant Genomes using RNA-seq. Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA) Annotation of Plant Genomes using RNA-seq Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA) inuscu1-35bp 5 _ 0 _ 5 _ What is Annotation inuscu2-75bp luscu1-75bp 0 _ 5 _ Reconstruction

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Figure S1: Phylogenetic tree of Pseudomonas and related bacteria. Phylogenetic trees were generated using parsimony, neighbor-joining and maximum

Figure S1: Phylogenetic tree of Pseudomonas and related bacteria. Phylogenetic trees were generated using parsimony, neighbor-joining and maximum Figure S1: Phylogenetic tree of Pseudomonas and related bacteria. Phylogenetic trees were are indicated in bold. Sequences retrieved from Col du Midi (Alps) are designated by the word Coldumidi followed

More information

Molecular evolution. Joe Felsenstein. GENOME 453, Autumn Molecular evolution p.1/49

Molecular evolution. Joe Felsenstein. GENOME 453, Autumn Molecular evolution p.1/49 Molecular evolution Joe Felsenstein GENOME 453, utumn 2009 Molecular evolution p.1/49 data example for phylogeny inference Five DN sequences, for some gene in an imaginary group of species whose names

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Detailed overview of the primer-free full-length SSU rrna library preparation.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Detailed overview of the primer-free full-length SSU rrna library preparation. Supplementary Figure 1 Detailed overview of the primer-free full-length SSU rrna library preparation. Detailed overview of the primer-free full-length SSU rrna library preparation. Supplementary Figure

More information

a,bD (modules 1 and 10 are required)

a,bD (modules 1 and 10 are required) This form should be used for all taxonomic proposals. Please complete all those modules that are applicable (and then delete the unwanted sections). For guidance, see the notes written in blue and the

More information

Study and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis

Study and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis Study and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis Kumud Joseph Kujur, Sumit Pal Singh, O.P. Vyas, Ruchir Bhatia, Varun Singh* Indian Institute of Information

More information

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons.

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons. Supplementary Figure 1 Fragment indexing allows efficient spectra similarity comparisons. The cost and efficiency of spectra similarity calculations can be approximated by the number of fragment comparisons

More information

aP. Short title: Mulberry badnavirus 1, a new species in the Badnavirus genus (e.g. 6 new species in the genus Zetavirus) Modules attached

aP. Short title: Mulberry badnavirus 1, a new species in the Badnavirus genus (e.g. 6 new species in the genus Zetavirus) Modules attached This form should be used for all taxonomic proposals. Please complete all those modules that are applicable (and then delete the unwanted sections). For guidance, see the notes written in blue and the

More information

GEP Annotation Report

GEP Annotation Report GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:

More information

Phylogenetic trees 07/10/13

Phylogenetic trees 07/10/13 Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share

More information

Videos. Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.

Videos. Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu. Translation Translation Videos Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.be/itsb2sqr-r0 Translation Translation The

More information

Tandem Mass Spectrometry: Generating function, alignment and assembly

Tandem Mass Spectrometry: Generating function, alignment and assembly Tandem Mass Spectrometry: Generating function, alignment and assembly With slides from Sangtae Kim and from Jones & Pevzner 2004 Determining reliability of identifications Can we use Target/Decoy to estimate

More information

Package vhica. April 5, 2016

Package vhica. April 5, 2016 Type Package Package vhica April 5, 2016 Title Vertical and Horizontal Inheritance Consistence Analysis Version 0.2.4 Date 2016-04-04 Author Arnaud Le Rouzic Suggests ape, plotrix, parallel, seqinr, gtools

More information

Additional file 10. Classification of Pac sequences based on maximum-likelihood (ML) phylogenetic analyses. Analyses were performed on the same

Additional file 10. Classification of Pac sequences based on maximum-likelihood (ML) phylogenetic analyses. Analyses were performed on the same Additional file 10. Classification of Pac sequences based on maximum-likelihood (ML) phylogenetic analyses. Analyses were performed on the same dataset alignments used for crucial Neighbor-joining trees

More information

Sequencing alignment Ameer Effat M. Elfarash

Sequencing alignment Ameer Effat M. Elfarash Sequencing alignment Ameer Effat M. Elfarash Dept. of Genetics Fac. of Agriculture, Assiut Univ. amir_effat@yahoo.com Why perform a multiple sequence alignment? MSAs are at the heart of comparative genomics

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species.

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species. Supplementary Figure 1 Icm/Dot secretion system region I in 41 Legionella species. Homologs of the effector-coding gene lega15 (orange) were found within Icm/Dot region I in 13 Legionella species. In four

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Organic Chemistry Option II: Chemical Biology

Organic Chemistry Option II: Chemical Biology Organic Chemistry Option II: Chemical Biology Recommended books: Dr Stuart Conway Department of Chemistry, Chemistry Research Laboratory, University of Oxford email: stuart.conway@chem.ox.ac.uk Teaching

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Hands-On Nine The PAX6 Gene and Protein

Hands-On Nine The PAX6 Gene and Protein Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

Multiple Sequence Alignment

Multiple Sequence Alignment Multiple Sequence Alignment Multiple Alignment versus Pairwise Alignment Up until now we have only tried to align two sequences. What about more than two? And what for? A faint similarity between two sequences

More information

Phylogenetic Tree Generation using Different Scoring Methods

Phylogenetic Tree Generation using Different Scoring Methods International Journal of Computer Applications (975 8887) Phylogenetic Tree Generation using Different Scoring Methods Rajbir Singh Associate Prof. & Head Department of IT LLRIET, Moga Sinapreet Kaur Student

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Lecture 18 June 2 nd, Gene Expression Regulation Mutations Lecture 18 June 2 nd, 2016 Gene Expression Regulation Mutations From Gene to Protein Central Dogma Replication DNA RNA PROTEIN Transcription Translation RNA Viruses: genome is RNA Reverse Transcriptase

More information

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila

More information

Title: A novel mechanism of protein thermostability: a unique N-terminal domain confers

Title: A novel mechanism of protein thermostability: a unique N-terminal domain confers 1 2 Title: A novel mechanism of protein thermostability: a unique N-terminal domain confers heat resistance to Fe/Mn-SODs 3 4 Running Title: Thermostability-improving peptide for SODs 5 6 7 8 Authors Wei

More information

Synteny Portal Documentation

Synteny Portal Documentation Synteny Portal Documentation Synteny Portal is a web application portal for visualizing, browsing, searching and building synteny blocks. Synteny Portal provides four main web applications: SynCircos,

More information

chapter 5 the mammalian cell entry 1 (mce1) operon of Mycobacterium Ieprae and Mycobacterium tuberculosis

chapter 5 the mammalian cell entry 1 (mce1) operon of Mycobacterium Ieprae and Mycobacterium tuberculosis chapter 5 the mammalian cell entry 1 (mce1) operon of Mycobacterium Ieprae and Mycobacterium tuberculosis chapter 5 Harald G. Wiker, Eric Spierings, Marc A. B. Kolkman, Tom H. M. Ottenhoff, and Morten

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/314/5799/642/dc1 Supporting Online Material for Thrice out of Africa: Ancient and Recent Expansions of the Honey Bee, Apis mellifera Charles W. Whitfield, * Susanta

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University kubatko.2@osu.edu

More information

Bioinformatics and BLAST

Bioinformatics and BLAST Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists

More information

Multiple Whole Genome Alignment

Multiple Whole Genome Alignment Multiple Whole Genome Alignment BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 206 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information