Causes for the Large Genome Size in a Cyanobacterium

Size: px
Start display at page:

Download "Causes for the Large Genome Size in a Cyanobacterium"

Transcription

1 Genome Informatics 15(1): (2004) 229 Causes for the Large Genome Size in a Cyanobacterium Anabaena sp. PCC7120 Nobuyoshi Sugaya 1 Makihiko Sato 1,2 Hiroo Murakami 1 sugaya@ims.u-tokyo.ac.jp makihiko@ims.u-tokyo.ac.jp hiroo@ims.u-tokyo.ac.jp Akira Imaizumi 1,3 Sachiyo Aburatani 1 Katsuhisa Horimoto 1 akima@ims.u-tokyo.ac.jp sachiyo@ims.u-tokyo.ac.jp khorimot@ims.u-tokyo.ac.jp 1 Laboratory of Biostatistics, Human Genome Center, Institute of Medical Science, Universityof Tokyo, Shirokane-dai, Minato-ku, Tokyo , Japan 2 Computer Science and Engineering Centre, Fujitsu Ltd., Nakase, Mihama-ku, Chiba City, Chiba , Japan 3 Advanced Technology Department, Fermentation and Biotechnology Laboratories, AJINOMOTOCO., INC., 1-1 Suzuki-cho, Kawasaki-ku, Kawasaki-shi , Japan Abstract Three possible causes responsible for the large genome size of a cyanobacterium Anabaena sp. PCC7120 are investigated: 1) sequential tandem duplications of gene segments, genes or genomic segments, 2) horizontal gene transfers from other organisms, and 3) whole-genome duplication. We evaluated the frequency distribution of angles between paralog locations for the possibility 1), the fraction of genes deviated in GC content, GC skew, AT skew and codon adaptation index for the 2) and the gene-configuration comparison of paralogs for the 3). As a result, the possibility 3), the whole-genome duplication, was more reasonable as a molecular cause than the other causes for the large genome size in Anabaena sp. PCC7120. In addition, the whole-genome duplication was supported by the analysis of distribution pattern of protein genes with respect to functional categories. Keywords: genome-size increase, tandem gene duplication, horizontal gene transfer, whole-genome duplication, gene-location distance, Anabaena sp. PCC Introduction In the phylum Cyanobacteria, complete genomic sequences have been determined in eight organisms: Synechocystis sp. PCC6803 [12], Anabaena sp. PCC7120 [11], Thermosynechococcus elongatus BP- 1 [20], Synechococcus sp. WH8102 [24], Prochlorococcus marinus SS120 [4], P. marinus MED4, P. marinus MIT9313 [26] and Gloeobacter violaceus PCC7421 [21]. Interestingly, only one species of the cyanobacteria, Anabaena sp. PCC7120 (hereafter Anabaena), has considerably large genome size of approximately 6.4 megabase (Mb), while the remaining species have the medium or small sizes of genomes ranging from approximately 1.7 Mb to 4.7 Mb. Size difference in bacterial genomes has been well known between parasitic bacteria such as Mycoplasma and Buchnera and free-living bacteria [2, 16, 17]. The genome sizes of the former are almost all in the range of about 1 Mb or less, while the genome sizes of the latter have on average larger genome sizes than the parasitic bacteria. The marked difference in genome sizes has been ascribed to the genome-size reduction in parasitic species, because in their genomes a large number of genes involved in biological processes such as biosyntheses of nutrients, cell motility and DNA repair system could have been lost during their course of evolution [2, 16, 17].

2 230 Sugaya et al. In contrast, a remarkable genome-size increase in free-living bacteria is a problem difficult to answer what causes the increase. Since all complete genomic sequences of cyanobacterial species have been derived from free-living species and the genomes show a remarkable size variation, they provide a good opportunity for investigating a molecular cause for genome-size difference in free-living bacteria. In the previous studies, three causes have been considered for the genome-size difference between freeliving bacteria (Fig. 1): 1) sequential tandem duplications of gene segments, single genes or genomic segments being composed of strings of some genes (e.g. operons) [7, 28], 2) horizontal gene transfers from other organisms [10, 22] and 3) whole-genome duplication [13, 25, 29]. In particular, there are little studies about the whole-genome duplication since the first determination of complete genomic sequence, while the two former causes are intensively investigated in a genomic scale. In this study, we investigate the three possibilities of molecular causes for the large size of genome in Anabaena. Figure 1: Schematic demonstration of the three mechanisms for bacterial genome enlargement. Genes encoded on an original genome and those newly arisen by each mechanism are shown by closed and open boxes, respectively.

3 Causes for the Large Genome Size in a Cyanobacterium Anabaena sp. PCC Materials and Methods 2.1 Genomic Data The information about all proteins encoded on the genome of the cyanobacterium Anabaena [11] is obtained from ftp site of National Center for Biotechnology Information [31]. Although Anabaena has several plasmids in its cell, we analyze the protein encoded on the chromosome in this study. The size of the Anabaena chromosome is 6,413,771 base pairs, and the chromosome encodes 5,366 proteins. 2.2 Paralogs in the Anabaena Genome To investigate the possibility of tandem gene duplications, we calculate the difference of the locations between the paralogs in the gene families. The difference is expected to be small if the occurrence of tandem duplications cause to the large genome size in Anabaena. Paralog families are searched with the program BLASTCLUST [1]. A protein is included in a paralog family by single-linkage clustering algorithm, when the protein satisfies the criteria that the e-value between the protein and a member of the paralog family is e 10 10, and that the pairwise alignment between them covers the region of 60% of both amino acid sequences of proteins. Paralog pairs used for calculating gene-location distance between two half genomes from Anabaena (see below) are detected with the program BLASTP [1] as pairs satisfying the criteria of reciprocal best hit and e Calculation of GC Content, GC Skew, AT Skew and Codon Adaptation Index (CAI) To estimate whether a gene has been horizontally transferred or vertically inherited, the degree of the deviation on nucleotide compositions is investigated. For this purpose, four measurements that are frequently used for the estimation are calculated: GC composition, GC skew, AT skew, and a bias in codon usage [3, 14, 15, 18]. The values of GC content, GC skew and AT skew are calculated for each gene as (f G + f C )/(f A + f T +f G +f C ), (f C f G )/(f C +f G ) and (f A f T )/(f A +f T ), respectively, when a frequency of base N in a gene is denoted by f N. As for the usage bias in codons, the bias is estimated by the CAI that measures the degree of bias toward the subset of codons used by highly expressed genes in an organism [27]. According to the previous study in Synechocystis sp. PCC6803 (Table 1 in Mrázek et al. [19]), the ribosomal proteins and proteins orthologous to predicted highly expressed ones are selected as the reference set. The set in Anabaena is composed of 55 ribosomal proteins, 13 photosynthesis/respiration related proteins, 6 chaperons, 5 translation/transcription processing factors and 11 proteins involved in other functions. 2.4 Calculation of Gene-Location Distance (GLD) To investigate the possibility of the whole genome duplication, we investigate the gene configuration of paralogs between two half regions of the Anabaena genome. The procedure is schematically shown in Fig. 2. To estimate the similarity of paralog configuration in two hypothetical half genomes, we calculate for each paralog pair the gene-location distance (GLD) [8, 9] that is derived from the correlation coefficient for circular data in the directional statistics [6]. One of the remarkable features of the GLD is that the fixation of a gene pair with the shortest GLD among all GLDs for gene pairs between two compared genomes in the same direction realizes the most similar configuration of all related genes in the compared genomes. With the use of the feature, we find the shortest GLD between the two half of genomes that are generated at each cutting angle.

4 232 Sugaya et al. Figure 2: Procedure for comparing gene locations between two half genomes of Anabaena. The actual Anabaena genome (shown by solid circle A) is divided into two half regions, 1 and 2, by cutting it at angles θ A and θ A (broken line through the circle A). Then, two hypothetical genomes (broken circles A 1 and A 2 ) are constructed from these two half regions by setting the angle θ A or θ A on the original genome A as the angle 0 on the A 1 or A 2 genome, respectively. Paralog pairs are searched between these two hypothetical genomes, and gene-location distances (GLDs) are calculated for all of these paralog pairs by rotating two hypothetical genomes. Among the GLDs, the shortest GLD is plotted against the cutting angle on the Anabaena genome. The above operation is iterated along each location of all protein genes. The GLD for a pair of gene i between circular genomes A and B is defined by the following equation: n sin(θi A θj A ) sin(θb i θj B ) g(a, B) Di = 0.5 j, j i (1) n n sin 2 (θi A θj A ) sin 2 (θi B θj B ) j, j i where n is the total number of paralog pairs between two compared genomes, and θi A (or θj A) and θb i (or θj B ) denote the angles of the ith (or jth) paralog pair on A and B, respectively. The distances range from 0.0 to 1.0 depending on the degree of the dissimilarity of their gene locations to the locations of other gene pairs on two genomes. Because GLD is invariant irrespective of the selection of the position for measuring the angles [9], in this study, the angles of genes on a genome are defined by 5 positions for coding regions of genes in GenBank-format file, regardless of the gene-encoded strands. j, j i

5 Causes for the Large Genome Size in a Cyanobacterium Anabaena sp. PCC Results 3.1 Assessment of Tandem Gene Duplication In this section, we investigate a possibility of tandem gene duplication by focusing on angles between paralog pairs on the Anabaena genome. The numbers of paralog families and paralog pairs detected with BLASTCLUST [1] are 581 and 17,955, respectively. The angles are calculated for all paralog pairs in each paralog family. The frequency distribution of the angles between paralogs is shown in Fig. 3. As seen in the figure, the distribution is almost uniform. This result indicates that many paralogs are not clustered within a region but distributed at various intervals on the Anabaena genome. On the assumption that most of Anabaena paralogs have been created by tandem gene duplications, the distribution in Fig. 3 is expected to be skewed in the small angles. Although the number of paralog pairs in the range of 0-10 is to some extent larger than other ranges, the fraction amounts to only 8.1%. Translocations of one of two genes subsequent to tandem duplications appears to be likely as an explanation of the nearly uniformity of the distribution. The rate of translocations, however, is not known in bacterial genomes, and thus we have no information how frequently translocations occur in the Anabaena genome. In the present study, we therefore judge that it seems difficult to explain the large size of the Anabaena genome by tandem gene duplications. Figure 3: Frequency distribution of angles between the locations of paralog pairs on the Anabaena genome. The angles between paralog pairs are calculated for all combinations of paralogs in each paralog family.

6 234 Sugaya et al. 3.2 Assessment of Horizontal Gene Transfer In this section, a possibility of horizontal gene transfer is investigated on the basis of the values of GC content, GC skew, AT skew and CAI for each Anabaena gene. The values are calculated for all 5,366 proteins encoded on the Anabaena chromosome. The four measurements were plotted against the locations on the genome in Fig. 4. As easily seen in the figures, the low fractions of genes display atypical base compositions and a biased codon usage. Indeed, the numbers with less than 5% of chance probability in each distribution of Figs. 4(a)-(d) are 300 (5.6%), 303 (5.7%), 357 (6.7%) and 272 (5.1%), respectively. Among the deviated genes, furthermore, we list the genes whose homologs are detected only in other bacteria excluding cyanobacteria as candidates of recently transferred genes with the criterion of e by the BLASTP. The numbers of the genes thus listed are only 37, 31, 18 and 28 in each distribution, respectively. The results indicate that most of protein-encoding genes on the contemporary Anabaena genome may be native genes vertically descended from a direct ancestor of the Anabaena lineage. Therefore, it seems to be difficult to explain the large size of the Anabaena genome by horizontal gene transfers. Figure 4: Plots of values of (a) GC content, (b) GC skew, (c) AT skew and (d) codon adaptation index (CAI) for each protein gene against an angle on the Anabaena genome. 3.3 Assessment of Whole-Genome Duplication In this section, a possibility of whole-genome duplication is investigated with the use of the measure of GLD. As described in Fig. 2, the Anabaena genome is divided into two half genomes by cutting the actual genome at each angle of paralogs, and the locations of paralogs are compared between the two with the GLD.

7 Causes for the Large Genome Size in a Cyanobacterium Anabaena sp. PCC A plot of GLDs is shown in Fig. 5. The GLD-plot shows a periodic pattern at 180 intervals, because a pair of hypothetical genomes yielded at a cutting angle θ A is the same as that yielded at θ A In the figure, we found the shortest GLDs at two cutting angles of 85 and 265 In other words, the paralog locations between the two hypothetical genomes show the most similar configuration when the Anabaena genome is cut at angles around 85 and 265 Furthermore, the shortest GLD is with the significance probability (P < 0.05) in an extreme-value distribution of GLDs by a simulation of randomizing paralog pairs. This indicates that the Anabaena genome may be composed of the two regions of and In summary, the gene-configuration comparison indicates that the whole-genome duplication is reasonable as a molecular cause for the large size of the Anabaena genome. Figure 5: Plot of the shortest GLDs against cutting angle on the Anabaena genome. The shortest GLDs were found at the cutting angles of all1273 (84.9 ) and all3916 (265.0 ). Each value is obtained when two hypothetical genomes are rotated in the direction that the angle of one paralog on one hypothetical genome agrees with that of another paralog on another hypothetical genome; the paralog pairs are all3016 (205.7 ) - all0012 (0.6 ) when cut at 84.9 and alr5317 (356.5 ) - asr3019 (205.8 ) when cut at The location (µ), scale (σ) and shape (ξ) parameters of the extreme-value distribution obtained by a simulation of randomizing paralog pairs are (µ, σ, ξ) = (0.457, 0.014, ). 4 Discussion and Conclusions The results in our present analyses indicate that only low fraction of the Anabaena proteins may have their origin in tandem gene duplication or horizontal gene transfer. This implies that it is difficult to

8 236 Sugaya et al. explain the large size of the Anabaena genome by the causes operating on the local structure of gene segments, single genes or strings of some genes. On the contrary, the comprehensive comparison of gene configuration with the measure of GLD reveals that the contemporary Anabaena genome may consist of two half regions. The feature of the genome may be ascribed to the cause of whole-genome duplication globally operating on all genes encoded on the genome. These results confirm us that in this genomic era it is necessary to study bacterial genomes from more macroscopic view of global structure of the genomes as well as from current view dealing with each gene at the level of amino acid/nucleotide sequences. Based on the result in the gene-configuration comparison, we further examine distribution pattern of genes on the two regions, the region and the one, on the Anabaena genome with respect to functional categories. The result is summarized in Table 1. As a result, the gene distributions on the two half of genomes are biased with the significance probability (P < 10 6 ). By the following residual analysis, a remarkable feature emerged for the functional categories. Indeed, 73% of proteins in the functional category Photosynthesis and respiration and 63% of those in Translation are encoded on the region. On the other hand, functional categories whose distributions are biased to the region are Central intermediary metabolism and Transport and binding proteins. Interestingly, most of proteins in the former categories have housekeeping functions that are essential to cyanobacteria, while many proteins in the latter categories have functions that are needed in some particular environment such as proteins involved in nitrogen fixation and metabolism under a condition of nitrogen deprivation [5]. The distribution patterns of functional categories may reflect a process through which the Anabaena genome has acquired protein genes with novel functions. Table 1: Distribution pattern of protein genes on the Anabaena genome with respect to functional categories. No. Functional category a Number of protein genes P value b Amino acid biosynthesis Biosynthesis of cofactors, prosthetic groups, and carriers Cell envelop Cellular processes Central intermediary metabolism b < Energy metabolism Fatty acid, phospholipid and sterol metabolism Photosynthesis and respiration b < Purines, pyrimidines, nucleosides, and nucleotides Regulatory functions DNA replication, recombination, and repair Transcription Translation b < Transport and binding proteins b < Other categories The number of protein genes included in each functional category is counted in two regions, the region and the one, on the Anabaena genome. The distribution pattern of functional categories is significantly not independent by the chi-squared test of independence (χ 2 = 58.4, P < 10 6 ). a Classification of functional categories and assignment of each protein to the categories follow those in Kaneko et al. [11]. b Functional categories showing a biased distribution with statistical significance in the residual analysis are underlined. P values are based on N(0, 1 2 ).

9 Causes for the Large Genome Size in a Cyanobacterium Anabaena sp. PCC As proposed by Ohno [23], whole-genome duplication can be a quick and easy way to amplify its gene repertoire with respect to protein functions. In this sense, the gene-configuration similarity and the biased gene distribution between two hypothetical half genomes support a possibility of wholegenome duplication in the Anabaena genome. Ancient whole-genome duplications (paleopolyploidy) have been already proposed in some lineages of eukaryotes [30]. Also in bacterial genomes, the event might have occurred and contributed bacterial species to expand their capabilities to adapt various environments on the earth through amplification of gene repertoire. Acknowledgments One of the authors (K. H.) was partly supported by a Grant-in-Aid for Scientific Research on Priority Areas Genome Information Science (grant ) and for Scientific Research (B) (grant ), from the Ministry of Education, Culture, Sports, Science and Technology of Japan. References [1] Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, Z., Miller, W., and Lipman, D.J., Gapped BLAST and PSI-BLAST: A new generation of protein database search programs, Nucleic Acids Res., 25: , [2] Andersson, S.G.E. and Kurland, C.G., Reductive evolution of resident genomes, Trends Microbiol., 6: , [3] Carbone, A., Zinovyev, A., and Képès, F., Codon adaptation index as a measure of dominating codon bias, Bioinformatics, 19: , [4] Dufresne, A., Salanoubat, M., Partensky, F., et al., Genome sequence of the cyanobacterium Prochlorococcus marinus SS120, a nearly minimal oxyphototrophic genome, Proc. Natl. Acad. Sci. USA, 100: , [5] Ehira, S., Ohmori, M., and Sato, N., Genome-wide expression analysis of the responses to nitrogen deprivation in the heterocyst-forming cyanobacterium Anabaena sp. strain PCC7120, DNA Res., 10:97 113, [6] Fisher, N.I. and Lee, A.J., A correlation coefficient for circular data, Biometrika, 70: , [7] Gu, Z., Cavalcanti, A., Chen, F.-C., Bouman, P., and Li, W.-H., Extent of gene duplication in the genomes of Drosophila, nematode, and yeast, Mol. Biol. Evol., 19: , [8] Horimoto, K., Fukuchi, S., and Mori, K., Comprehensive comparison between locations of orthologous genes on archaeal and bacterial genomes, Bioinformatics, 17: , [9] Horimoto, K., Suyama, M., Toh, H., Mori, K., and Otsuka, J., A method for comparing circular genomes from gene locations: application to mitochondrial genomes, Bioinformatics, 14: , [10] Jain, R., Rivera, M.C., Moore, J.E., and Lake, J.A., Horizontal gene transfer accelerates genome innovation and evolution, Mol. Biol. Evol., 20: , [11] Kaneko, T., Nakamura, Y., Wolk, C.P., et al., Complete genomic sequence of the filamentous nitrogen-fixing cyanobacterium Anabaena sp. strain PCC7120, DNA Res., 8: , , 2001.

10 238 Sugaya et al. [12] Kaneko, T., Sato, S., Kotani, H., et al., Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions, DNA Res., 3: , [13] Kunisawa, T. and Otsuka, J., Periodic distribution of homologous genes or gene segments on the Escherichia coli K12 genome, Protein Seq. Data Anal., 1: , [14] Lawrence, J.G. and Ochman, H., Amelioration of bacterial genomes: Rates of change and exchange, J. Mol. Evol., 44: , [15] Lawrence, J.G. and Ochman, H., Molecular archaeology of the Escherichia coli genome, Proc. Natl. Acad. Sci. USA, 95: , [16] Maniloff, J., The minimal cell genome: On being the right size, Proc. Natl. Acad. Sci. USA, 93: , [17] Mira, A., Ochman, H., and Moran, N.A., Deletional bias and the evolution of bacterial genomes, Trends Genet., 17: , [18] Moszer, I., Rocha, E.P.C., and Danchin, A., Codon usage and lateral gene transfer in Bacillus subtilis, Curr. Opin. Microbiol., 2: , [19] Mrázek, J., Bhaya, D., Grossman, A.R., and Karlin, S., Highly expressed and alien genes of the Synechocystis genome, Nucleic Acids Res., 29: , [20] Nakamura, Y., Kaneko, T., Sato, S., et al., Complete genome structure of the thermophilic cyanobacterium Thermosynechococcus elongatus BP-1, DNA Res., 9: , [21] Nakamura, Y., Kaneko, T., Sato, S., et al., Complete genome structure of Gloeobacter violaceus PCC7421, a cyanobacterium that lacks thylakoids, DNA Res., 10: , [22] Ochman, H., Lawrence, J.G., and Groisman, E.A., Lateral gene transfer and the nature of bacterial innovation, Nature, 18: , [23] Ohno, S., Evolution by gene duplication, Springer-Verlag, New York, [24] Palenik, B., Brahamsha, B., Larimer, F.W., et al., The genome of a motile marine Synechococcus, Nature, 424: , [25] Riley, M. and Anilionis, A., Evolution of the bacterial genome, Annu. Rev. Microbiol., 32: , [26] Rocap, G., Larimer, F.W., Lamerdin, J., et al., Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation, Nature, 424: , [27] Sharp, P.M. and Li, W.-H., The codon adaptation index - a measure of directional synonymous codon usage bias, and its potential applications, Nucleic Acids Res., 15: , [28] Snel, B., Bork, P., and Huynen, M.A., Genomes in flux: The evolution of archaeal and proteobacterial gene content, Genome Res., 12:17 25, [29] Wallace, D.C. and Morowitz, H.J., Genome size and evolution, Chromosoma, 40: , [30] Wolfe, K.H., Yesterday s polyploids and the mystery of diploidization, Nature Rev. Genet., 2: , [31] ftp://ftp.ncbi.nih.gov/genbank/genomes/bacteria/nostoc_sp/

Topology. 1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores.

Topology. 1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores. Topology 1 Introduction 2 3 Genome size 4 Replichores and gene orientation 5 Chirochores 6 G+C content 7 Codon usage 27 marc.bailly-bechet@univ-lyon1.fr The big picture Eukaryota Bacteria Many linear chromosomes

More information

Fitness constraints on horizontal gene transfer

Fitness constraints on horizontal gene transfer Fitness constraints on horizontal gene transfer Dan I Andersson University of Uppsala, Department of Medical Biochemistry and Microbiology, Uppsala, Sweden GMM 3, 30 Aug--2 Sep, Oslo, Norway Acknowledgements:

More information

Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative genomics: Overview & Tools + MUMmer algorithm Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first

More information

An Introduction to Sequence Similarity ( Homology ) Searching

An Introduction to Sequence Similarity ( Homology ) Searching An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

2 Genome evolution: gene fusion versus gene fission

2 Genome evolution: gene fusion versus gene fission 2 Genome evolution: gene fusion versus gene fission Berend Snel, Peer Bork and Martijn A. Huynen Trends in Genetics 16 (2000) 9-11 13 Chapter 2 Introduction With the advent of complete genome sequencing,

More information

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3 The Minimal-Gene-Set -Kapil Rajaraman(rajaramn@uiuc.edu) PHY498BIO, HW 3 The number of genes in organisms varies from around 480 (for parasitic bacterium Mycoplasma genitalium) to the order of 100,000

More information

Introduction to Bioinformatics Integrated Science, 11/9/05

Introduction to Bioinformatics Integrated Science, 11/9/05 1 Introduction to Bioinformatics Integrated Science, 11/9/05 Morris Levy Biological Sciences Research: Evolutionary Ecology, Plant- Fungal Pathogen Interactions Coordinator: BIOL 495S/CS490B/STAT490B Introduction

More information

Bio 119 Bacterial Genomics 6/26/10

Bio 119 Bacterial Genomics 6/26/10 BACTERIAL GENOMICS Reading in BOM-12: Sec. 11.1 Genetic Map of the E. coli Chromosome p. 279 Sec. 13.2 Prokaryotic Genomes: Sizes and ORF Contents p. 344 Sec. 13.3 Prokaryotic Genomes: Bioinformatic Analysis

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law

Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law Ze Zhang,* Z. W. Luo,* Hirohisa Kishino,à and Mike J. Kearsey *School of Biosciences, University of Birmingham,

More information

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17. Genetic Variation: The genetic substrate for natural selection What about organisms that do not have sexual reproduction? Horizontal Gene Transfer Dr. Carol E. Lee, University of Wisconsin In prokaryotes:

More information

Horizontal transfer and pathogenicity

Horizontal transfer and pathogenicity Horizontal transfer and pathogenicity Victoria Moiseeva Genomics, Master on Advanced Genetics UAB, Barcelona, 2014 INDEX Horizontal Transfer Horizontal gene transfer mechanisms Detection methods of HGT

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Biology 105/Summer Bacterial Genetics 8/12/ Bacterial Genomes p Gene Transfer Mechanisms in Bacteria p.

Biology 105/Summer Bacterial Genetics 8/12/ Bacterial Genomes p Gene Transfer Mechanisms in Bacteria p. READING: 14.2 Bacterial Genomes p. 481 14.3 Gene Transfer Mechanisms in Bacteria p. 486 Suggested Problems: 1, 7, 13, 14, 15, 20, 22 BACTERIAL GENETICS AND GENOMICS We still consider the E. coli genome

More information

Sequence Database Search Techniques I: Blast and PatternHunter tools

Sequence Database Search Techniques I: Blast and PatternHunter tools Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered

More information

Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

More information

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA XIUFENG WAN xw6@cs.msstate.edu Department of Computer Science Box 9637 JOHN A. BOYLE jab@ra.msstate.edu Department of Biochemistry and Molecular Biology

More information

RGP finder: prediction of Genomic Islands

RGP finder: prediction of Genomic Islands Training courses on MicroScope platform RGP finder: prediction of Genomic Islands Dynamics of bacterial genomes Gene gain Horizontal gene transfer Gene loss Deletion of one or several genes Duplication

More information

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Genome reduction in prokaryotic obligatory intracellular parasites of humans: a comparative analysis

Genome reduction in prokaryotic obligatory intracellular parasites of humans: a comparative analysis International Journal of Systematic and Evolutionary Microbiology (2004), 54, 1937 1941 DOI 10.1099/ijs.0.63090-0 Genome reduction in prokaryotic obligatory intracellular parasites of humans: a comparative

More information

Amelioration of Bacterial Genomes: Rates of Change and Exchange

Amelioration of Bacterial Genomes: Rates of Change and Exchange J Mol Evol (1997) 44:383 397 Springer-Verlag New York Inc. 1997 Amelioration of Bacterial Genomes: Rates of Change and Exchange Jeffrey G. Lawrence, 1, * Howard Ochman 2 1 Department of Biology, University

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

BACTERIA AND ARCHAEA 10/15/2012

BACTERIA AND ARCHAEA 10/15/2012 BACTERIA AND ARCHAEA Chapter 27 KEY CONCEPTS: Structural and functional adaptations contribute to prokaryotic success Rapid reproduction, mutation, and genetic recombination promote genetic diversity in

More information

Hiromi Nishida. 1. Introduction. 2. Materials and Methods

Hiromi Nishida. 1. Introduction. 2. Materials and Methods Evolutionary Biology Volume 212, Article ID 342482, 5 pages doi:1.1155/212/342482 Research Article Comparative Analyses of Base Compositions, DNA Sizes, and Dinucleotide Frequency Profiles in Archaeal

More information

Comparative Bioinformatics Midterm II Fall 2004

Comparative Bioinformatics Midterm II Fall 2004 Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans

More information

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=

More information

Unsupervised Learning in Spectral Genome Analysis

Unsupervised Learning in Spectral Genome Analysis Unsupervised Learning in Spectral Genome Analysis Lutz Hamel 1, Neha Nahar 1, Maria S. Poptsova 2, Olga Zhaxybayeva 3, J. Peter Gogarten 2 1 Department of Computer Sciences and Statistics, University of

More information

The Prokaryotic World

The Prokaryotic World The Prokaryotic World A. An overview of prokaryotic life There is no doubt that prokaryotes are everywhere. By everywhere, I mean living in every geographic region, in extremes of environmental conditions,

More information

The percentage of bacterial genes on leading versus lagging strands is influenced by multiple balancing forces

The percentage of bacterial genes on leading versus lagging strands is influenced by multiple balancing forces The percentage of bacterial genes on leading versus lagging strands is influenced by multiple balancing forces Xizeng Mao 1, Han Zhang 1,4, Yanbin Yin 1, 2 1, 2, 3, Ying Xu 1 Computational Systems Biology

More information

General context Anchor-based method Evaluation Discussion. CoCoGen meeting. Accuracy of the anchor-based strategy for genome alignment.

General context Anchor-based method Evaluation Discussion. CoCoGen meeting. Accuracy of the anchor-based strategy for genome alignment. CoCoGen meeting Accuracy of the anchor-based strategy for genome alignment Raluca Uricaru LIRMM, CNRS Université de Montpellier 2 3 octobre 2008 1 / 31 Summary 1 General context 2 Global alignment : anchor-based

More information

Graph Alignment and Biological Networks

Graph Alignment and Biological Networks Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

SUPPLEMENTARY METHODS

SUPPLEMENTARY METHODS SUPPLEMENTARY METHODS M1: ALGORITHM TO RECONSTRUCT TRANSCRIPTIONAL NETWORKS M-2 Figure 1: Procedure to reconstruct transcriptional regulatory networks M-2 M2: PROCEDURE TO IDENTIFY ORTHOLOGOUS PROTEINSM-3

More information

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem University of Groningen Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's

More information

In-Depth Assessment of Local Sequence Alignment

In-Depth Assessment of Local Sequence Alignment 2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.

More information

Single alignment: Substitution Matrix. 16 march 2017

Single alignment: Substitution Matrix. 16 march 2017 Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block

More information

COMPARATIVE PATHWAY ANNOTATION WITH PROTEIN-DNA INTERACTION AND OPERON INFORMATION VIA GRAPH TREE DECOMPOSITION

COMPARATIVE PATHWAY ANNOTATION WITH PROTEIN-DNA INTERACTION AND OPERON INFORMATION VIA GRAPH TREE DECOMPOSITION COMPARATIVE PATHWAY ANNOTATION WITH PROTEIN-DNA INTERACTION AND OPERON INFORMATION VIA GRAPH TREE DECOMPOSITION JIZHEN ZHAO, DONGSHENG CHE AND LIMING CAI Department of Computer Science, University of Georgia,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Base Composition Skews, Replication Orientation, and Gene Orientation in 12 Prokaryote Genomes

Base Composition Skews, Replication Orientation, and Gene Orientation in 12 Prokaryote Genomes J Mol Evol (1998) 47:691 696 Springer-Verlag New York Inc. 1998 Base Composition Skews, Replication Orientation, and Gene Orientation in 12 Prokaryote Genomes Michael J. McLean, Kenneth H. Wolfe, Kevin

More information

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization. 3.B.1 Gene Regulation Gene regulation results in differential gene expression, leading to cell specialization. We will focus on gene regulation in prokaryotes first. Gene regulation accounts for some of

More information

Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal

Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal Genômica comparativa João Carlos Setubal IQ-USP outubro 2012 11/5/2012 J. C. Setubal 1 Comparative genomics There are currently (out/2012) 2,230 completed sequenced microbial genomes publicly available

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Supplementary Information

Supplementary Information Supplementary Information Supplementary Figure 1. Schematic pipeline for single-cell genome assembly, cleaning and annotation. a. The assembly process was optimized to account for multiple cells putatively

More information

UNIVERSITY OF YORK. BA, BSc, and MSc Degree Examinations Department : BIOLOGY. Title of Exam: Molecular microbiology

UNIVERSITY OF YORK. BA, BSc, and MSc Degree Examinations Department : BIOLOGY. Title of Exam: Molecular microbiology Examination Candidate Number: Desk Number: UNIVERSITY OF YORK BA, BSc, and MSc Degree Examinations 2017-8 Department : BIOLOGY Title of Exam: Molecular microbiology Time Allowed: 1 hour 30 minutes Marking

More information

MiGA: The Microbial Genome Atlas

MiGA: The Microbial Genome Atlas December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From

More information

Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

More information

Gibbs Sampling Methods for Multiple Sequence Alignment

Gibbs Sampling Methods for Multiple Sequence Alignment Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical

More information

A Novel Method for Similarity Analysis of Protein Sequences

A Novel Method for Similarity Analysis of Protein Sequences 5th International Conference on Advanced Design and Manufacturing Engineering (ICADME 2015) A Novel Method for Similarity Analysis of Protein Sequences Longlong Liu 1, a, Tingting Zhao 1,b and Maojuan

More information

Evaluation of the Number of Different Genomes on Medium and Identification of Known Genomes Using Composition Spectra Approach.

Evaluation of the Number of Different Genomes on Medium and Identification of Known Genomes Using Composition Spectra Approach. Evaluation of the Number of Different Genomes on Medium and Identification of Known Genomes Using Composition Spectra Approach Valery Kirzhner *1 & Zeev Volkovich 2 1 Institute of Evolution, University

More information

Introduction to Molecular and Cell Biology

Introduction to Molecular and Cell Biology Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the molecular basis of disease? What

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species.

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species. Supplementary Figure 1 Icm/Dot secretion system region I in 41 Legionella species. Homologs of the effector-coding gene lega15 (orange) were found within Icm/Dot region I in 13 Legionella species. In four

More information

Microbial Diversity. Yuzhen Ye I609 Bioinformatics Seminar I (Spring 2010) School of Informatics and Computing Indiana University

Microbial Diversity. Yuzhen Ye I609 Bioinformatics Seminar I (Spring 2010) School of Informatics and Computing Indiana University Microbial Diversity Yuzhen Ye (yye@indiana.edu) I609 Bioinformatics Seminar I (Spring 2010) School of Informatics and Computing Indiana University Contents Microbial diversity Morphological, structural,

More information

GACE Biology Assessment Test I (026) Curriculum Crosswalk

GACE Biology Assessment Test I (026) Curriculum Crosswalk Subarea I. Cell Biology: Cell Structure and Function (50%) Objective 1: Understands the basic biochemistry and metabolism of living organisms A. Understands the chemical structures and properties of biologically

More information

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution Taxonomy Content Why Taxonomy? How to determine & classify a species Domains versus Kingdoms Phylogeny and evolution Why Taxonomy? Classification Arrangement in groups or taxa (taxon = group) Nomenclature

More information

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology 2012 Univ. 1301 Aguilera Lecture Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the

More information

the noisy gene Biology of the Universidad Autónoma de Madrid Jan 2008 Juan F. Poyatos Spanish National Biotechnology Centre (CNB)

the noisy gene Biology of the Universidad Autónoma de Madrid Jan 2008 Juan F. Poyatos Spanish National Biotechnology Centre (CNB) Biology of the the noisy gene Universidad Autónoma de Madrid Jan 2008 Juan F. Poyatos Spanish National Biotechnology Centre (CNB) day III: noisy bacteria - Regulation of noise (B. subtilis) - Intrinsic/Extrinsic

More information

Supplemental Materials

Supplemental Materials JOURNAL OF MICROBIOLOGY & BIOLOGY EDUCATION, May 2013, p. 107-109 DOI: http://dx.doi.org/10.1128/jmbe.v14i1.496 Supplemental Materials for Engaging Students in a Bioinformatics Activity to Introduce Gene

More information

Chapter 7: Rapid alignment methods: FASTA and BLAST

Chapter 7: Rapid alignment methods: FASTA and BLAST Chapter 7: Rapid alignment methods: FASTA and BLAST The biological problem Search strategies FASTA BLAST Introduction to bioinformatics, Autumn 2007 117 BLAST: Basic Local Alignment Search Tool BLAST (Altschul

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

Conservation of Gene Co-Regulation between Two Prokaryotes: Bacillus subtilis and Escherichia coli

Conservation of Gene Co-Regulation between Two Prokaryotes: Bacillus subtilis and Escherichia coli 116 Genome Informatics 16(1): 116 124 (2005) Conservation of Gene Co-Regulation between Two Prokaryotes: Bacillus subtilis and Escherichia coli Shujiro Okuda 1 Shuichi Kawashima 2 okuda@kuicr.kyoto-u.ac.jp

More information

Molecular evolution - Part 1. Pawan Dhar BII

Molecular evolution - Part 1. Pawan Dhar BII Molecular evolution - Part 1 Pawan Dhar BII Theodosius Dobzhansky Nothing in biology makes sense except in the light of evolution Age of life on earth: 3.85 billion years Formation of planet: 4.5 billion

More information

Quantitative Molecular Biology

Quantitative Molecular Biology Quantitative Molecular Biology PHYS 176/276 Instructor: Terry Hwa Winter 2018 What is quantitative biology? èquantitative biology biology + numbers/equations biology-inspired physics application of existing

More information

Mass Identification of Chloroplast Proteins of Endosymbiont Origin by Phylogenetic Profiling Based on Organism-Optimized Homologous Protein Groups

Mass Identification of Chloroplast Proteins of Endosymbiont Origin by Phylogenetic Profiling Based on Organism-Optimized Homologous Protein Groups 56 Genome Informatics 16(2): 56 68 (2005) Mass Identification of Chloroplast Proteins of Endosymbiont Origin by Phylogenetic Profiling Based on Organism-Optimized Homologous Protein Groups Naoki Sato 1

More information

Big Idea 1: The process of evolution drives the diversity and unity of life. Sunday, August 28, 16

Big Idea 1: The process of evolution drives the diversity and unity of life. Sunday, August 28, 16 Big Idea 1: The process of evolution drives the diversity and unity of life. Enduring understanding 1.B: Organisms are linked by lines of descent from common ancestry. Essential knowledge 1.B.1: Organisms

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever. CSE 527 Computational Biology http://www.cs.washington.edu/527 Lecture 1: Overview & Bio Review Autumn 2004 Larry Ruzzo Related Courses He who asks is a fool for five minutes, but he who does not ask remains

More information

Microbial Taxonomy and the Evolution of Diversity

Microbial Taxonomy and the Evolution of Diversity 19 Microbial Taxonomy and the Evolution of Diversity Copyright McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display. 1 Taxonomy Introduction to Microbial Taxonomy

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

Vital Statistics Derived from Complete Genome Sequencing (for E. coli MG1655)

Vital Statistics Derived from Complete Genome Sequencing (for E. coli MG1655) We still consider the E. coli genome as a fairly typical bacterial genome, and given the extensive information available about this organism and it's lifestyle, the E. coli genome is a useful point of

More information

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome Dr. Dirk Gevers 1,2 1 Laboratorium voor Microbiologie 2 Bioinformatics & Evolutionary Genomics The bacterial species in the genomic era CTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAACATGTTATTCAG GTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAAATTATGTTTCCCATGCATCAGG

More information

Mining Infrequent Patterns of Two Frequent Substrings from a Single Set of Biological Sequences

Mining Infrequent Patterns of Two Frequent Substrings from a Single Set of Biological Sequences Mining Infrequent Patterns of Two Frequent Substrings from a Single Set of Biological Sequences Daisuke Ikeda Department of Informatics, Kyushu University 744 Moto-oka, Fukuoka 819-0395, Japan. daisuke@inf.kyushu-u.ac.jp

More information

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS August 0 Vol 4 No 005-0 JATIT & LLS All rights reserved ISSN: 99-8645 wwwjatitorg E-ISSN: 87-95 EVOLUTIONAY DISTANCE MODEL BASED ON DIFFEENTIAL EUATION AND MAKOV OCESS XIAOFENG WANG College of Mathematical

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Biology I Fall Semester Exam Review 2014

Biology I Fall Semester Exam Review 2014 Biology I Fall Semester Exam Review 2014 Biomolecules and Enzymes (Chapter 2) 8 questions Macromolecules, Biomolecules, Organic Compunds Elements *From the Periodic Table of Elements Subunits Monomers,

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology

More information

# shared OGs (spa, spb) Size of the smallest genome. dist (spa, spb) = 1. Neighbor joining. OG1 OG2 OG3 OG4 sp sp sp

# shared OGs (spa, spb) Size of the smallest genome. dist (spa, spb) = 1. Neighbor joining. OG1 OG2 OG3 OG4 sp sp sp Bioinformatics and Evolutionary Genomics: Genome Evolution in terms of Gene Content 3/10/2014 1 Gene Content Evolution What about HGT / genome sizes? Genome trees based on gene content: shared genes Haemophilus

More information

Niche specific amino acid features within the core genes of the genus Shewanella

Niche specific amino acid features within the core genes of the genus Shewanella www.bioinformation.net Hypothesis Volume 8(19) Niche specific amino acid features within the core genes of the genus Shewanella Rachana Banerjee* & Subhasis Mukhopadhyay Department of Biophysics, Molecular

More information

Number of questions TEK (Learning Target) Biomolecules & Enzymes

Number of questions TEK (Learning Target) Biomolecules & Enzymes Unit Biomolecules & Enzymes Number of questions TEK (Learning Target) on Exam 8 questions 9A I can compare and contrast the structure and function of biomolecules. 9C I know the role of enzymes and how

More information

DNA Technology, Bacteria, Virus and Meiosis Test REVIEW

DNA Technology, Bacteria, Virus and Meiosis Test REVIEW Be prepared to turn in a completed test review before your test. In addition to the questions below you should be able to make and analyze a plasmid map. Prokaryotic Gene Regulation 1. What is meant by

More information

Is inversion symmetry of chromosomes a law of nature?

Is inversion symmetry of chromosomes a law of nature? Is inversion symmetry of chromosomes a law of nature? David Horn TAU Safra bioinformatics retreat, 28/6/2018 Lecture based on Inversion symmetry of DNA k-mer counts: validity and deviations. Shporer S,

More information

Molecular Evolution & the Origin of Variation

Molecular Evolution & the Origin of Variation Molecular Evolution & the Origin of Variation What Is Molecular Evolution? Molecular evolution differs from phenotypic evolution in that mutations and genetic drift are much more important determinants

More information

Molecular Evolution & the Origin of Variation

Molecular Evolution & the Origin of Variation Molecular Evolution & the Origin of Variation What Is Molecular Evolution? Molecular evolution differs from phenotypic evolution in that mutations and genetic drift are much more important determinants

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

Microbiology - Problem Drill 04: Prokayotic & Eukaryotic Cells - Structures and Functions

Microbiology - Problem Drill 04: Prokayotic & Eukaryotic Cells - Structures and Functions Microbiology - Problem Drill 04: Prokayotic & Eukaryotic Cells - Structures and Functions No. 1 of 10 1. Eukaryote is a word that describes one of two living cell classifications. The word comes from Greek

More information

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis he universe of biological sequence analysis Word/pattern recognition- Identification of restriction enzyme cleavage sites Sequence alignment methods PstI he universe of biological sequence analysis - prediction

More information

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss Methods Identification of orthologues, alignment and evolutionary distances A preliminary set of orthologues was

More information

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid. 1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the

More information