Mass Identification of Chloroplast Proteins of Endosymbiont Origin by Phylogenetic Profiling Based on Organism-Optimized Homologous Protein Groups

Size: px
Start display at page:

Download "Mass Identification of Chloroplast Proteins of Endosymbiont Origin by Phylogenetic Profiling Based on Organism-Optimized Homologous Protein Groups"

Transcription

1 56 Genome Informatics 16(2): (2005) Mass Identification of Chloroplast Proteins of Endosymbiont Origin by Phylogenetic Profiling Based on Organism-Optimized Homologous Protein Groups Naoki Sato 1 Masayuki Ishikawa 1 naokisat@bio.c.u-tokyo.ac.jp Ishimasa@bio.c.u-tokyo.ac.jp Makoto Fujiwara 1 Kintake Sonoike 2 mtf1@bio.c.u-tokyo.ac.jp sonoike@k.u-tokyo.ac.jp 1 Department of Life Sciences, Graduate School of Arts and Sciences, University of Tokyo, Komaba, Meguro-ku, Tokyo , Japan 2 Department of Integrated Biosciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwanoha, Kashiwa-shi, Chiba, , Japan Abstract Chloroplasts originate from ancient cyanobacteria-like endosymbiont. Several tens of chloroplast proteins are encoded by the chloroplast genome, while more than hundreds are encoded by the nuclear genome in plants and algae, but the exact number and identity of nuclear-encoded chloroplast proteins are still unknown. We describe here attempts to identify a large number of unidentified chloroplast proteins of endosymbiont origin (CPRENDOs). Our strategy consists of whole genome protein clustering by the homolog group method, which is optimized for organism number, and phylogenetic profiling that extract groups conserved in cyanobacteria and photosynthetic eukaryotes. An initial minimal set of CPRENDOs was predicted without targeting prediction and experimentally validated. Keywords: genomic clustering, homolog group, CPRENDO, Gclust, endosymbiosis 1 Introduction Chloroplast is a photosynthetic organelle within plant and algal cells. It is also present as chromoplast, amyloplast, elaioplast, and leucoplast, depending on types of cells in flowering plants. A general term for all these organelles related to chloroplast is plastid. Plastid is also involved in various metabolism such as biosynthesis of fatty acids, isoprenoids, tetrapyrrols, amino acids, and some plant hormones. It is also the sole site of assimilation of nitrogen and sulfur in plant cells. Plants (and algae) acquired chloroplasts by endosymbiosis, which occurred 1.6 Ga (billion years ago) [22]. The endosymbiont was closely related to present-day cyanobacteria [4], but it is still not clear which cyanobacterium was the most related to the chloroplast ancestor. Such endosymbiosis theory is supported by the fact that the genes encoded in the chloroplast genomes are phylogenetically most related to the orthologs in cyanobacteria. Indeed, the endosymbiosis was a big event of massive transfer of genes from cyanobacteria to photosynthetic eukaryotes, and is a good target of comparative genomic studies. In algae and plants, many chloroplast proteins are encoded by the nuclear genome, and many of them are supposed to be transferred from the ancient endosymbiont. Chloroplasts also use proteins of eukaryotic origin. Therefore, chloroplast proteome is a chimera of proteins originated from both endosymbiont and eukaryotic host [1, 13, 17]. However, photosynthesis-related proteins and the enzymes involved in chloroplast biogenesis (transcription and translation) are mostly of endosymbiont origin. Based on this consideration, we tried to estimate the list of chloroplast proteins that were acquired by the

2 Mass Identification of Chloroplast Proteins of Endosymbiont Origin 57 endosymbiotic event. This is a good (maybe the best of all similar examples) challenge of comparative genomics [3, 5]. We present here a generally applicable method of phylogenetic profiling, which focuses on unidentified proteins that are conserved in a certain group of organisms that share a common physiological property or pathway. After the initial presentation in GIW three year ago [18], we made efforts in both computational and experimental works [19, 20]. In the computational efforts, the Gclust software has been revised as described above and implemented the clique mode. In addition, use of an intermediate file facilitated rapid analysis with different parameters. In the experimental efforts, a minimal set of CPRENDOs as estimated using the old version of Gclust was analyzed. In the present communication, we present results of revised prediction of CPRENDOs based on current version of Gclust as well as results of experimental verification of the minimal set CPRENDOs, and discuss on the effectiveness of phylogenetic profiling in comparative genomics. 2 Method 2.1 Major Features of the Methodology The following points are emphasized in the present study: (1) Use of homolog groups but NOT ortholog groups (based on bidirectional best hit) A usually used method for phylogenetic clustering relies on ortholog groups. Two genes (or proteins) are defined as orthologs if they originate from an identical ancestral gene. However, in computational biology, orthologs are operationally defined by bi-directional best-hit relationship inferred by BLAST or SSEARCH analysis. In practice, several paralogs or highly related genes are present in every genome such as those encoding protein families, and it is not always easy and practical to identify the correct orthologs (as originally defined) without phylogenetic analysis. We have been using homolog group method [7], in which all homologous proteins are included in each cluster. Such method allows detailed phylogenetic analysis of the homolog group to identify true orthologs. (2) Use of both E-value and homologous regions of BLASTP output for clustering Many clustering practices use BLASTP or SSEARCH data for hierarchical clustering using a single criterion such as E-value. Use of such simple criterion in BLASTP-based homolog group method produces large aggregates of various proteins bridged by multidomain proteins [18, 19]. To avoid this, we use homologous region information to infer both overlap score and domain structure. Overlap score for two proteins is defined as sum of total overlap region in both proteins devided by total length of both proteins, namely, (a1 + a2 + b1 + b1 + b2)/(length1 + length2) in the example shown in Figure 1. Constraint for E-value, overlap score, and domain structure are used for clustering to infer really homologous proteins by excluding functionally different proteins sharing one or several domains. Figure 1: An example showing calculation of overlap score homologous regions are indicated.

3 58 Sato et al. (3) Organism-optimized clustering If homolog group is constructed solely based on sequence similarity, clusters are not always suitable for phylogenetic profiling. A cluster may contain many proteins of the same family, or a single protein family is split into several different clusters according to phylogenetic positions. This problem is partially solved during the initial cluster formation using the 2D table (see below) and at the last stage of clustering. (4) Experimental validation of computational estimation We believe that any bioinformatics inference should be experimentally validated. In many informatics studies, logical consistency is the sole criterion of evaluation of computational estimation. But biologically meaningful results are most important in bioinformatics. Inference of chloroplast proteins of endosymbiont origin (CPRENDOs) may be one of the best applications of phylogenetic profiling that can be experimentally verified. 2.2 Preparation for Clustering All proteins in selected genomes were clustered by homolog group method. To this end, one of the authors (NS) developed a software called Gclust, which reads all-against-all BLASTP results and outputs a list of homologous protein groups (homolog groups). The software was written in C, and runs on any common UNIX machines if enough memory is available. The overall flow of data processing is shown in Figure 2. A typical source of genomic protein data is a GenBank flat file. The gbk file was processed to produce a FASTA file and a file of annotation. Such data of various genomes were assembled to get a single FASTA file and an annotation table. For eukaryotic organisms, nuclear as well as organellar (mitochondrial and chlorplast, if present) genomes were used. The two files were processed to give another FASTA file (**.gfa) and an annotation table (**.g.table). In the **.gfa file, all protein names were converted to numbers to save disk space during the BLASTP search. The numbers can be converted back to the original protein names by referencing the **.g.table. Next, all-against-all BLASTP search (versions ) [2] was done using the FASTA file (**.gfa) as an input. The output was directly pipelined into bl2ls3.pl to produce a list of homology regions and E-values, using a threshold for E-value at 1e-3. The resultant file was then used for input into Gclust software. The BLASTP step is the most time-consuming step, and is done as multiple jobs with split files on several different servers. All sequence file manipulation such as format conversion and file splitting was done with the SISEQ software (version 1.30) [16]. 2.3 Organism-Optimized Clustering with Gclust Software Gclust software [18] version [23] was run in the clique mode. The BLASTP results were processed in the following two steps (Figure 2): first, the data were read and partially transformed into intermediate data format and saved in a large file data.out for further analysis with various different settings of parameters. Low homology data were removed with keeping data with E-values for short sequences (from 1e-6 for >100 aa to 1e-3 for <40 aa). All single-path relations were picked up from the homology data. Domain composition of each protein was also estimated using the homology regions with different subject proteins. At this stage, multi-domain proteins as well as very large proteins (>2,000 aa, for example) were marked with a flag. In the second step, Gclust reads the data.out file, and performs clustering using the -clique option, which produces a good clustering result in a relatively short time (within one day for a dataset containing 141 organisms). In the clique mode, the homology data were converted to a structure called match, which held data of binary (i.e., protein-to-protein) similarity, namely, E-value, overlap score, and domain composition estimated as above. Normally, clique mode uses a list of organisms provided by the org list file. For each protein, all match data were tabulated in 2D, using E-value and overlap score (Figure 4A). The 2D table lists distribution of match array data using a pre-defined

4 Mass Identification of Chloroplast Proteins of Endosymbiont Origin 59 category scheme, which can be customized using a configuration file called var list. Match data were selected one by one starting from the initially selected best local maximum. The search scaned in a circular or diamond manner around the initial starting point. The scanning to lower overlap score and higher E-value stopped, if the number of members re-increased. This is a sign of another group of homologs with a lower similarity. This operation was done on a shadow table (Figure 4B), in which non-negative values indicate selected area, and the increasing number indicates path of search. By applying such criteria among others, a clearly defined cluster of match data with respect to E-value and overlap score was selected (boxed area). In addition, match data were selected to include as many organisms as possible but without picking up very low similarity data (the output below Figure 4A and B). After such purification of match data, a list of homologs was made for each protein. The threshold E-value and overlap score were also stored. Then, homolog clusters were formed by merging individual lists. At this stage, clusters with very diffent threshold E-values were not merged. After a repeat of merging and removing, orphan entries generated by removal step were again incorporated into the most adequate cluster. Clusters were again optimized for number of organisms. Homolog groups were sorted according to the number of entries. Finally, homolog groups were printed out to a large file as a catenated similarity matrix (Figure 5A). The matrix may be expressed in 1/0 (similar/dissimilar), E-value, and/or overlap score, depending on output options, 1, r, and/or s, respectively. 3 Results 3.1 Prediction of CPRENDOs by Phylogenetic Profiling Using a perl script homologtableg3b.pl, the homology matrix was transformed into a table showing members of each homolog group (Figure 3 and 5B). This table was used to extract homolog groups that are shared by various combinations of organisms (phylogenetic profiling). Note that proteins encoded by both organellar and nuclear genomes were included in the data set of eukaryotic organisms. Therefore, we selected organisms rather than genomes in the phylogenetic profiling. For the prediction of CPRENDOs, a data set CZ16Y containing all predicted proteins in nine species of cyanobacteria, Arabidopsis thaliana (plant) [21], Cyanidioscyzon merolae (red alga) [14], three species of photosynthetic bacteria, two species of bacteria, and two eukaryotes was used. All data were taken from the GenBank data repository, except for those of Cyanidioschyzon, which were obtained from the Cyanidioschyzon Genome Project [24]. Cyanidioschyzon is a representative of the red lineage of photosynthetic eukaryotes, and we expected that the use of a plant (green lineage) and a red alga increases accuracy of phylogenetic profiling. The homolog groups that are shared by cyanobacteria, Arabidopsis and Cyanidioscyzon were selected (Table 1). At this step, various constraints were tested in the selection. Conservation in cyanobacteria was one constraint, and allowance for presence in other organisms was another constraint. The first constraint could be complete conservation in all cyanobacteria (nine species), but many homologs of chloroplast proteins are not completely conserved in all cyanobacteria. A phylogenetic analysis suggests that plastids are sister to Anabaena-Synechocystis clade (Sato, unpublished results). Therefore, Anabaena [11] and Synechocystis [12] could be used as representatives of cyanobacteria. But we fould that all chloroplast proteins are not conserved in both cyanobacteria. We finally adopted a strategy in which any proteins conserved in a certain number of cyanobacteria were selected, irrespective of combination of cyanobacteria. The number of cyanobacteria was also a variable, but five species (out of nine) gave satisfactory results (Figure 6 and Table 1). Allowance for presence in other organisms was also tested. Table 1 compares effects of allowance in photosynthetic bacteria and non-photosynthetic organisms. Photosynthetic bacteria perform photosynthesis without oxygen evolution, with a single photosystem using machineries that are distantly related to those of cyanobacteria and plants. Therefore, the inclusion of photosynthetic bacteria could affect phylogenetic profiling of CPRENDOs. In addition, paralogs of some photosynthesisrelated proteins (ATP synthase, ribosomal proteins, and even a RuBisCO subunit) are present in

5 60 Sato et al. Figure 3: Flow chart of data processing for further analysis towards phylogenetic analysis. Figure 2: Flow chart of data processing until formation of homology matrix. Figure 4: Selection of match data in the clique mode. A. 2D table (rows, overlap score; lines, E-value) showing distribution of match data. A best local maximum is selected first (circle). Other local maxima with lower similarity are indicated by dotted circles. B. A shadow table for working. Zero is the start of search. Non-negative values show selected groups.

6 Mass Identification of Chloroplast Proteins of Endosymbiont Origin 61 Figure 5: Example output of Gclust (A) and tabular summary of homologs as generated by further processing (B). In (A), protein name (combination of genome name and gene identifier), number of amino acid residues, similarity matrix, and annotation in the original database are listed from left to right. The similarity matrix is a square matrix, having identical set of proteins in both vertical and horizontal directions. Each protein belongs to a single group. The similarity detected by BLASTP but not incorporated into the clustering is listed below the main matrix as Related groups. Each line in related groups consists of group number (number of members in parenthesis) and protein name. In (B), all homolog groups are listed with number of members in each genome. The annotation is taken from the first member.

7 62 Sato et al. Figure 6: A Venn diagram showing homolog groups shared by the three organism categories, Arabidopsis thaliana (green plant), Cyanidioschyzon merolae (red alga) and 5 cyanobacteria. Here, 5 Cyanos indicates >=5 of 9 cyanobacteria analyzed. This result was obtained with the selection method G shown in Table 1. In this diagram, each area is drawn proportional to the number of groups using a tcl/tk software called TriGraph (Sato, unpublished). Table 1: Number of homolog groups selected with different criteria. Number of homolog groups that are conserved in at least 5 among 9 cyanobacteria, Arabidopsis (Ath), and Cyanidioschyzon (Cme) are listed with varying additional conservation in photosynthetic bacteria (PhotoBact) and other organisms (Others). Others include C. elegans, S. cerevisiae, E. coli and B. subtilis. Number of homolog groups consisting of known chloroplast proteins or unknown proteins is listed. Each number in parenthesis indicates proportion of groups. Finally, number of members in Ath and Cme belonging to selected groups is listed. Selection PhotoBact Others of Groups Known cp proteins Unknowns of Ath proteins of Cme proteins A (0.44) 44 (0.52) D (0.46) 55 (0.49) E (0.44) 72 (0.48) F (0.48) 97 (0.31) G (0.49) 127 (0.29)

8 Mass Identification of Chloroplast Proteins of Endosymbiont Origin 63 non-photosynthetic organisms. These facts may make the profiling complicated. However, the results in Table 1 show that the allowance for other organisms has little effect on the proportion of clusters containing known chloroplast proteins, which was in different selections. In contrast, the proportion of clusters containing unknown proteins decrease with increasing allowance for other organisms. These results suggest that allowance for other organisms may be as wide as possible. Conservation in cyanobacteria, Arabidopsis and Cyanidioschyzon may be, therefore, a reliable criterion for selecting CPRENDOs. A Venn diagram (Figure 6) shows that there are smaller numbers of homolog groups that are shared by cyanobacteria and Arabidopsis, or cyanobacteria and Cyanidioschyzon. These groups could include proteins conserved in only green or red lineages, and may be studied as CPRENDO-like proteins. The groups shared by Arabidopsis and Cyanidioschyzon represent eukaryotic proteins, and are not candidate for CPRENDOs. 3.2 Minimal Set of CPRENDOs To verify the predicted CPRENDOs experimentally, we planned to analyze plastid localization of the predicted CPRENDOs (see 3.3.1). The data that we used for the experimental study as described below was predicted two years ago [18] using an older version of Gclust (verion 2.1.2) and an older database (dataset CZ16). At that time, homolog groups were constructed simply using various different E-values, and the homolog groups that were conserved in eight cyanobacteria, a red alga and a green plant but not in other non-photosynthetic organisms were selected at each cutoff E-value. The selected groups were combined, and used as a minimal set of CPRENDOs (Table 3). In total, 51 homolog groups were selected. Among them, 19 were clusters of known chloroplast proteins, such as Psa and Psb proteins. The remaining 32 groups were selected as targets of initial experimental study. These homolog groups were generally included in the selection A (Table 1) of the current CZ16Y dataset, with minor inconsistency. 3.3 Experimental Verification of the Minimal Set of CPRENDOs We performed experimental verification of the minimal set of CPRENDOs to test our idea that phylogenetic profiling is useful in predicting CPRENDOs, since we believe that all informatic prediction should be experimentally verified. The experimental verification consists of the following four analyses: localization of proteins, light-regulated expression, phenotype of cyanobacterial disruptants and plant tag-lines Localization of Predicted CPRENDOs in A. thaliana Localization of the predicted CPRENDOs was analyzed by using Green Fluorescent Protein (GFP)- fusion constructs. Each construct was prepared by successive PCR and either linear DNA or plasmids were transiently transformed into onion epidermis by particle bombardment. The localization of GFPfusion protein was analyzed by fluorescence microscopy on the next day. The results (Table 2) showed that 49 out of 52 proteins were targeted to plastids. Interestingly, five proteins were also targeted to mitochondria. Such dual targeting is common in plant organellar proteins [10]. It should be noted that the localization as predicted by TargetP [6] (not shown) was generally in good agreement, six proteins were not correctly predicted to be targeted to chloroplasts Light-Dependent Expression of the Genes for CPRENDOs in A. thaliana Expression of the predicted CPRENDOs was analyzed by RNA-blot analysis using 7-day-old seedlings, and the results are also shown in Table 2. As many as 36 genes encoding predicted CPRENDOs showed light-dependent expression, which is also expected for proteins involved in photosynthesis or chloroplast biogenesis. Nine genes were constitutively expressed, while expression of seven genes was

9 64 Sato et al. below the detection limit of the method employed. Cross-examination of localization and expression indicates abundance (31 proteins) of light-regulated chloroplast proteins Analysis of Synechocystis Disruptans The genes for the cyanobacterial homologs of predicted CPRENDOs were disrupted in Synechocystis sp. PCC For this purpose, a rapid method of preparation of disruption construct was developed using repeated PCR. Among the 41 genes, 33 were disrupted completely, while five were not completely segregated, and might represent essential genes. Three constructs were not successfully made, due to technical problems in PCR ( PCR problem in Table 3). Table 2: Summary of localization and expression of predicted minimal set of CPRENDOs. Cp, chloroplasts (plastids); Mt, mitochondria; Cyto, cytoplasm; nuc, nucleus. L > D, expression in the light was higher than that in the dark; L = D, expression was comparable in the light and the dark; No exp, no expression was detected by RNA-blot analysis. Expression Localization L > D L = D No Exp. Cp Cp & Mt Mt Cyto & nuc Total Fluorescence induction kinetics was measured as an indicator of photosynthetic performance (Table 3). Growth defect was also noted for some disruptants. In 22 disruptants, defects in growth or fluorescence kinetics was noted. These results suggest that the selected genes are important for the normal growth in cyanobacteria Analysis of A. thaliana Mutant Lines Mutants of the predicted CPRENDOs were analyzed using the SALK T-DNA tag-lines [25]. The analysis is still in progress, but we obtained homozygous lines for 25 CPRENDOs. During our experiments in the past two years, reports were published on four of the CPRENDOs, namely, Tab2 (in Chlamydomonas), Psb29/Thf1, APE1 and HY2. These are not components of photosynthetic machinery except Psb29, but are involved in its biogenesis. This demonstrates the correctness of our strategy, and many of the remaining CPRENDOs are also likely to be important in the biogenesis of photosynthetic machinery. However, only two of the Arabidopsis mutant lines showed visible phenotypes, such as variegation. The CPRENDO gene in one of them has been already annotated as ycf65, a hypothetical chloroplast reading frame, because it is encoded in the chloroplast genome in some algae such as Cyanidioschyzon. A mutant of ycf65 in Synechocystis also showed growth defect. Ycf65 protein is likely to be important in both chloroplasts and cyanobacteria. 4 Discussion 4.1 Evaluation of the Prediction Strategy of CPRENDOs The present study shows that phylogenetic profiling is useful in predicting CPRENDOs. Essential methodology for predicting CPRENDOs consists of (1) constructing homolog groups from total predicted proteins of both photosynthetic and non-photosynthetic organisms, and (2) selecting groups that are conserved in photosynthetic organisms under appropriate constraints. A probable estimate of

10 Mass Identification of Chloroplast Proteins of Endosymbiont Origin 65 Table 3: Summary of results of functional analysis. Phenotypes of Synechocystis disruptants, localization and light regulation of Arabidopsis genes, and the number of homozygous tag-lines are listed. For Synechocystis mutants, mutant ID is indicated with phenotypes (segregation state, growth properties, and fluorescence properties, in this order), if present. Localization is shown in abbreviated words (see Table 2). Blue underline indicates light-regulated expression. Confirmed CPRENDOs are marked by bold characters. For homozygous tag-lines, visible phenotype is marked by bold red characters. In annotation, Ycf stands for hypothetical chloroplast ORF. ID Function Synechocystis mutants Arabidopsis reported during the work Annotation Mutant ID: Phenotype (segregation, growth, fluorescence) # of localization (UL=L>D, Bold=CPRENDO) Ath, # of homozygous 5 Hypothetical 10; 11 2 cp, 1 mt 3 6 Hypothetical 12; (13: PCR problem) 1 cp 7 Ycf52 6: -, slow, - 2 cp 1 8 Hypothetical 14 2 cp 3 9 Yes Tab2 (Chlamydomonas) 10 Hypothetical tag-lines 15; (16: PCR problem) 1 cp Not available 17: incomplete, light sensitive, low peak 12 Hypothetical (18: PCR problem) 13 Membrane protease 14 Hypothetical 19: -, light sensitive, very low peak 20: -, slow, high second peak; 21: -, -, Low peak 1 cp 1 cp, 1 cp, 1 (cp), 1 nuc 1 cp, 1 cp, 1 (cp, mt) 15 Hypothetical 22: -, slow, low peak 1 cp 16 Probable ferredoxin (2Fe-2S) 23: -, -, low second peak 1 (cp) 1 cp 1 18 Ycf cp 2 32 Ycf19-like 4: incomplete, -, - 1 cp Not available 19 Hypothetical 24 1 (cp) 21 Ycf60 25: -, pale green and light sensitive, - 1 cp, 1 cp 2 22 Hypothetical 26: -, -, low peak 1 (cp, mt) 1 23 Ycf65 27: -, slow, - 2 cp 1, 1 27 Hypothetical 28 1 (cp) 33 Yes Psb29/Thf1/APG5 29 (sll1414): no phenotype 2g20890 (cp) (Not tried) 34 Rubredoxin 30: -, -, low peak 1 cp Not available 35 Hypothetical 31: incomplete, slow, high peak 1 cp 1 39 Yes APE1 32(slr0575): -, -, low peak 5g38660 (cp) (Not tried) 40 Hypothetical 8, -, slow, - 1 cp 1 41 Hypothetical 33: -, -, very low peak 1 cp, 1 (cp) Not available 43 Hypothetical 34 1 cp Not available 44 Hypothetical 35: -, -, no decrease after peak 1 cp Not available 3

11 66 Sato et al. ID Table 4: Continuation of Table 3. Function Synechocystis mutants Arabidopsis reported during the work 46 Yes Annotation HY2 (phycobilin synthesis) Mutant ID: Phenotype (segregation, growth, fluorescence) 36(slr0116): incomplete, -, high peak # of localization (UL=L>D, Bold=CPRENDO) 3g09150 (1 cp) Ath, # of homozygous tag-lines (Not tried) 47 Ycf20 37: -, slow, - 3 (cp, mt) 2 49 Hypothetical 38: -, slow, - 1 cp 51 Hypothetical 38 3 cp 3 54 Hypothetical 40 1 cp 55 Hypothetical 59 ATP-dependent proteinase 62 Hypothetical 41: -, slow, no decrease after peak 1 (cp) 45 2 cp 47: incomplete, light sensitive and slow, - 1 (cp) the number of CPRENDOs is 1192 in Arabidopsis and 676 in Cyanidioscyzon. A previous study [13] estimated the upper limit of chloroplast proteins of endosymbiont origin as about 4,500 in Arabidopsis, and another study [1] suggested about plant proteins originated from cyanobacterial endosymbiont. A more recent estimate was about 880 [15]. These estimates were done by calculation, but not by complete enumeration. These studies also showed that a significant proportion of proteins of cyanobacterial origin might be located in non-chloroplast compartment, which is not the case in our result. This could be partly due to the limitation of targeting prediction [15], but also to the inaccuracy in the prediction. In contrast, the results of present study on the minimal set of predicted CPRENDOs clearly indicate that almost all of them are chloroplast proteins, although no targeting prediction was used in the prediction process. A reasonable explanation of the discrepancy may be that we used conservation in 5 cyanobacteria, Arabidopsis, and Cyanidioschyzon as a criterion, while previous studies used conservation in only Arabidopsis and Synechocystis, or a similar simple criterion, which overestimates number of proteins conserved in plants and cyanobacteria. In addition, these previous studies used simple one plant vs one cyanobacterium relationship using a single cutoff E-value for all proteins. Our approach using phylogenetic profiling based on homolog groups gives robust clusters, which could yield a more solid prediction. 4.2 General Usefulness of Phylogenetic Profiling General success of our approach of comparative genomics prompted us to extend phylogenetic profiling to prediction of various other proteins that are conserved in a certain group of organisms. Prediction of pathogenicity-related proteins was done in various bacterial groups including strains with or without pathogenicity [8, 9]. Such analysis might not need sophisticated strategy of genomic comparison. But identification of proteins, which are conserved in a wide range of organisms that are not closely related phylogenetically, requires a solid clustering and phylogenetic profiling. The phylogenetic profiling with Gclust database will be a powerful tool for identifying plant-specific proteins and proteins specific to flowering plants, if more plant genomic sequences are available.

12 Mass Identification of Chloroplast Proteins of Endosymbiont Origin 67 References [1] Abdallah, F., Salamini, F., and Leister, D., A prediction of the size and evolutionary origin of the proteome of chloroplasts of Arabidopsis, Trends Plant Sci., 5: , [2] Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J., Gapped BLAST and PSI-BLAST: A new generation of protein database search programs, Nucleic Acids Res., 25: , [3] Bansal, A.K. and Meyer, T.E., Evolutionary analysis by whole-genome comparisons, J. Bacteriol., 184: , [4] Cavalier-Smith, T., Genomic reduction and evolution of novel genetic membranes and proteintargeting machinery in eukaryote-eukaryote chimaeras (meta-algae). Phil. Trans. R. Soc. Lond., 358B: , [5] Eisen, J.A., Assessing evolutionary relationships among microbes from whole-genome analysis, Curr. Opinion Microbiol., 3: , [6] Emanuelsson, O., Nielsen, H., Brunak, S., and von Heijne, G., Predicting subcellular localization of proteins based on their N-terminal amino acid sequence, J. Mol. Biol., 300: , [7] House, C.H. and Fitz-Gibbon, S.T., Using homolog groups to create a whole-genomic tree of free-living organisms: An update, J. Mol. Evol., 54: , [8] Janssen, P.J., Audit, B., and Ouzounis, C.A., Strain-specific genes of Helicobacter pylori: Distribution, function and dynamics, Nucleic Acids Res., 29: , [9] Jin, Q., et al., Genome sequence of Shigella flexneri 2a: Insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157, Nucleic Acids Res., 30: , [10] Kabeya, Y. and Sato, N., Unique translation initiation at the second AUG codon determines mitochondrial localization of the phage-type RNA polymerases in the moss Physcomitrella patens. Plant Physiol., 138: , [11] Kaneko, T., et al., Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions, DNA Res., 3: , [12] Kaneko, T., et al., Complete genomic sequence of the filamentous nitrogen-fixing cyanobacterium Anabaena sp. strain PCC 7120, DNA Res., 8: , [13] Martin, W., Rujan, T., Richly, E., Hansen, A., Cornelsen, S., Lins, T., Leister, D., Stoebe, B., Hasegawa, M., and Penny, D., Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc. Nat. Acad. Sci. USA, 99: , [14] Matsuzaki, M., et al., Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D, Nature, 428: , [15] Richly, E. and Leister, D., An improved prediction of chloroplast proteins reveals diversities and commonalities in the chloroplast proteomes of Arabidopsis and rice, Gene, 329:11 16, [16] Sato, N., SISEQ: Manipulation of multiple sequence and large database files for common platforms, Bioinformatics, 16: , 2000.

13 68 Sato et al. [17] Sato, N., Was the evolution of plastid genetic machinery discontinuous?, Trends Plant Sci., 6: , [18] Sato, N., Comparative analysis of the genomes of cyanobacteria and plants, Genome Inform., 13: , [19] Sato, N., Gclust: Genome-wide clustering of protein sequences for identification of photosynthesisrelated genes resulting from massive horizontal gene transfer, Genome Inform., 14: , [20] Sato, N. and Ishikawa, M., Identification of novel chloroplast proteins of endosymbiotic origin by phylogenetic profiling using homolog groups, Abstract Book of GIW2004, P139, [21] The Arabidopsis Genome Initiative, Analysis of the genome sequence of the flowering plant Arabidopsis thaliana, Nature, 408: , [22] Wang, D. Y. C., Kumar, S., and Hedges, S. B., Divergence time estimates for the early history of animal phyla and the origin of plants, animals and fungi. Proc. Biol. Sci., 266B: , [23] [24] [25]

Apicoplast. Apicoplast - history. Treatments and New drug targets

Apicoplast. Apicoplast - history. Treatments and New drug targets Treatments and New drug targets What is the apicoplast? Where does it come from? How are proteins targeted to the organelle? How does the organelle replicate? What is the function of the organelle? - history

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Introduction to Bioinformatics Integrated Science, 11/9/05

Introduction to Bioinformatics Integrated Science, 11/9/05 1 Introduction to Bioinformatics Integrated Science, 11/9/05 Morris Levy Biological Sciences Research: Evolutionary Ecology, Plant- Fungal Pathogen Interactions Coordinator: BIOL 495S/CS490B/STAT490B Introduction

More information

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; "fast- clock" molecules for fine-structure.

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; fast- clock molecules for fine-structure. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Origins of Life. Fundamental Properties of Life. Conditions on Early Earth. Evolution of Cells. The Tree of Life

Origins of Life. Fundamental Properties of Life. Conditions on Early Earth. Evolution of Cells. The Tree of Life The Tree of Life Chapter 26 Origins of Life The Earth formed as a hot mass of molten rock about 4.5 billion years ago (BYA) -As it cooled, chemically-rich oceans were formed from water condensation Life

More information

2 Genome evolution: gene fusion versus gene fission

2 Genome evolution: gene fusion versus gene fission 2 Genome evolution: gene fusion versus gene fission Berend Snel, Peer Bork and Martijn A. Huynen Trends in Genetics 16 (2000) 9-11 13 Chapter 2 Introduction With the advent of complete genome sequencing,

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

Microbial Taxonomy. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbial Taxonomy. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

BL1102 Essay. The Cells Behind The Cells

BL1102 Essay. The Cells Behind The Cells BL1102 Essay The Cells Behind The Cells Matriculation Number: 120019783 19 April 2013 1 The Cells Behind The Cells For the first 3,000 million years on the early planet, bacteria were largely dominant.

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

Energy Converion: Mitochondria and Chloroplasts. Pınar Tulay, Ph.D.

Energy Converion: Mitochondria and Chloroplasts. Pınar Tulay, Ph.D. Energy Converion: Mitochondria and Chloroplasts Pınar Tulay, Ph.D. pintulay@gmail.com Energy Conversion Prokaryotes use plasma membrane to produce adenosine triphosphate (ATP) used in the cell function

More information

Organelle genome evolution

Organelle genome evolution Organelle genome evolution Plant of the day! Rafflesia arnoldii -- largest individual flower (~ 1m) -- no true leafs, shoots or roots -- holoparasitic -- non-photosynthetic Big questions What is the origin

More information

ORIGIN OF CELLULARITY AND CELLULAR DIVERSITY

ORIGIN OF CELLULARITY AND CELLULAR DIVERSITY ORIGIN OF CELLULARITY AND CELLULAR DIVERSITY Geological stratigraphy, together with radioactive dating, show the sequence of events in the history of the Earth. Note the entry for cyanobacteria and stromatolites

More information

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3 The Minimal-Gene-Set -Kapil Rajaraman(rajaramn@uiuc.edu) PHY498BIO, HW 3 The number of genes in organisms varies from around 480 (for parasitic bacterium Mycoplasma genitalium) to the order of 100,000

More information

Microbial Taxonomy and the Evolution of Diversity

Microbial Taxonomy and the Evolution of Diversity 19 Microbial Taxonomy and the Evolution of Diversity Copyright McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display. 1 Taxonomy Introduction to Microbial Taxonomy

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Ch 7: Cell Structure and Functions. AP Biology

Ch 7: Cell Structure and Functions. AP Biology Ch 7: Cell Structure and Functions AP Biology The Cell Theory 1. All living things are made of cells. 2. New cells come from existing cells. 3. Cells are the basic units of structure and function of living

More information

HORIZONTAL TRANSFER IN EUKARYOTES KIMBERLEY MC GRAIL FERNÁNDEZ GENOMICS

HORIZONTAL TRANSFER IN EUKARYOTES KIMBERLEY MC GRAIL FERNÁNDEZ GENOMICS HORIZONTAL TRANSFER IN EUKARYOTES KIMBERLEY MC GRAIL FERNÁNDEZ GENOMICS OVERVIEW INTRODUCTION MECHANISMS OF HGT IDENTIFICATION TECHNIQUES EXAMPLES - Wolbachia pipientis - Fungus - Plants - Drosophila ananassae

More information

Big Idea 1: The process of evolution drives the diversity and unity of life. Sunday, August 28, 16

Big Idea 1: The process of evolution drives the diversity and unity of life. Sunday, August 28, 16 Big Idea 1: The process of evolution drives the diversity and unity of life. Enduring understanding 1.B: Organisms are linked by lines of descent from common ancestry. Essential knowledge 1.B.1: Organisms

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

SPECIES OF ARCHAEA ARE MORE CLOSELY RELATED TO EUKARYOTES THAN ARE SPECIES OF PROKARYOTES.

SPECIES OF ARCHAEA ARE MORE CLOSELY RELATED TO EUKARYOTES THAN ARE SPECIES OF PROKARYOTES. THE TERMS RUN AND TUMBLE ARE GENERALLY ASSOCIATED WITH A) cell wall fluidity. B) cell membrane structures. C) taxic movements of the cell. D) clustering properties of certain rod-shaped bacteria. A MAJOR

More information

Reconstructing Mitochondrial Evolution?? Morphological Diversity. Mitochondrial Diversity??? What is your definition of a mitochondrion??

Reconstructing Mitochondrial Evolution?? Morphological Diversity. Mitochondrial Diversity??? What is your definition of a mitochondrion?? Reconstructing Mitochondrial Evolution?? What is your definition of a mitochondrion?? Morphological Diversity Mitochondria as we all know them: Suprarenal gland Liver cell Plasma cell Adrenal cortex Mitochondrial

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

In-Depth Assessment of Local Sequence Alignment

In-Depth Assessment of Local Sequence Alignment 2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.

More information

Chapter 19. Microbial Taxonomy

Chapter 19. Microbial Taxonomy Chapter 19 Microbial Taxonomy 12-17-2008 Taxonomy science of biological classification consists of three separate but interrelated parts classification arrangement of organisms into groups (taxa; s.,taxon)

More information

Comparative Bioinformatics Midterm II Fall 2004

Comparative Bioinformatics Midterm II Fall 2004 Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

More information

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila

More information

Light reaction. Dark reaction

Light reaction. Dark reaction Photosynthesis Light reaction Dark reaction Electro-magnetic irradiance and sunlight CO 2 and O 2 fixation by Rubisco Oxygenic photosynthesis was established in Cyanobacteria Localisation of the

More information

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information -

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information - Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes - Supplementary Information - Martin Bartl a, Martin Kötzing a,b, Stefan Schuster c, Pu Li a, Christoph Kaleta b a

More information

Phylogeny & Systematics

Phylogeny & Systematics Phylogeny & Systematics Phylogeny & Systematics An unexpected family tree. What are the evolutionary relationships among a human, a mushroom, and a tulip? Molecular systematics has revealed that despite

More information

This is a repository copy of Microbiology: Mind the gaps in cellular evolution.

This is a repository copy of Microbiology: Mind the gaps in cellular evolution. This is a repository copy of Microbiology: Mind the gaps in cellular evolution. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/114978/ Version: Accepted Version Article:

More information

Biology 160 Cell Lab. Name Lab Section: 1:00pm 3:00 pm. Student Learning Outcomes:

Biology 160 Cell Lab. Name Lab Section: 1:00pm 3:00 pm. Student Learning Outcomes: Biology 160 Cell Lab Name Lab Section: 1:00pm 3:00 pm Student Learning Outcomes: Upon completion of today s lab you will be able to do the following: Properly use a compound light microscope Discuss the

More information

2. Cellular and Molecular Biology

2. Cellular and Molecular Biology 2. Cellular and Molecular Biology 2.1 Cell Structure 2.2 Transport Across Cell Membranes 2.3 Cellular Metabolism 2.4 DNA Replication 2.5 Cell Division 2.6 Biosynthesis 2.1 Cell Structure What is a cell?

More information

Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative genomics: Overview & Tools + MUMmer algorithm Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

PHYLOGENY AND SYSTEMATICS

PHYLOGENY AND SYSTEMATICS AP BIOLOGY EVOLUTION/HEREDITY UNIT Unit 1 Part 11 Chapter 26 Activity #15 NAME DATE PERIOD PHYLOGENY AND SYSTEMATICS PHYLOGENY Evolutionary history of species or group of related species SYSTEMATICS Study

More information

Sequenced Mitochondrial Genomes of Bryophytes

Sequenced Mitochondrial Genomes of Bryophytes Mitochondrial Genomes of Bryophytes 1 Sequenced Mitochondrial Genomes of Bryophytes Asheesh Shanker Department of Bioscience and Biotechnology, Banasthali University, Rajasthan, India Abstract: The determination

More information

Text of objective. Investigate and describe the structure and functions of cells including: Cell organelles

Text of objective. Investigate and describe the structure and functions of cells including: Cell organelles This document is designed to help North Carolina educators teach the s (Standard Course of Study). NCDPI staff are continually updating and improving these tools to better serve teachers. Biology 2009-to-2004

More information

Eukaryotic Cells. Figure 1: A mitochondrion

Eukaryotic Cells. Figure 1: A mitochondrion Eukaryotic Cells Figure 1: A mitochondrion How do cells accomplish all their functions in such a tiny, crowded package? Eukaryotic cells those that make up cattails and apple trees, mushrooms and dust

More information

Molecular evolution - Part 1. Pawan Dhar BII

Molecular evolution - Part 1. Pawan Dhar BII Molecular evolution - Part 1 Pawan Dhar BII Theodosius Dobzhansky Nothing in biology makes sense except in the light of evolution Age of life on earth: 3.85 billion years Formation of planet: 4.5 billion

More information

ORIGIN OF METABOLISM Where did early life get its energy? How did cell structures become complex?

ORIGIN OF METABOLISM Where did early life get its energy? How did cell structures become complex? ORIGIN OF METABOLISM Where did early life get its energy? How did cell structures become complex? Geological stratigraphy, together with radioactive dating, show the sequence of events in the history of

More information

What Organelle Makes Proteins According To The Instructions Given By Dna

What Organelle Makes Proteins According To The Instructions Given By Dna What Organelle Makes Proteins According To The Instructions Given By Dna This is because it contains the information needed to make proteins. assemble enzymes and other proteins according to the directions

More information

Overview of Cells. Prokaryotes vs Eukaryotes The Cell Organelles The Endosymbiotic Theory

Overview of Cells. Prokaryotes vs Eukaryotes The Cell Organelles The Endosymbiotic Theory Overview of Cells Prokaryotes vs Eukaryotes The Cell Organelles The Endosymbiotic Theory Prokaryotic Cells Archaea Bacteria Come in many different shapes and sizes.5 µm 2 µm, up to 60 µm long Have large

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

Evolution Problem Drill 09: The Tree of Life

Evolution Problem Drill 09: The Tree of Life Evolution Problem Drill 09: The Tree of Life Question No. 1 of 10 Question 1. The age of the Earth is estimated to be about 4.0 to 4.5 billion years old. All of the following methods may be used to estimate

More information

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and counterclockwise for the inner row, with green representing coding

More information

Biology 105/Summer Bacterial Genetics 8/12/ Bacterial Genomes p Gene Transfer Mechanisms in Bacteria p.

Biology 105/Summer Bacterial Genetics 8/12/ Bacterial Genomes p Gene Transfer Mechanisms in Bacteria p. READING: 14.2 Bacterial Genomes p. 481 14.3 Gene Transfer Mechanisms in Bacteria p. 486 Suggested Problems: 1, 7, 13, 14, 15, 20, 22 BACTERIAL GENETICS AND GENOMICS We still consider the E. coli genome

More information

SUPPLEMENTARY METHODS

SUPPLEMENTARY METHODS SUPPLEMENTARY METHODS M1: ALGORITHM TO RECONSTRUCT TRANSCRIPTIONAL NETWORKS M-2 Figure 1: Procedure to reconstruct transcriptional regulatory networks M-2 M2: PROCEDURE TO IDENTIFY ORTHOLOGOUS PROTEINSM-3

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Small RNA in rice genome

Small RNA in rice genome Vol. 45 No. 5 SCIENCE IN CHINA (Series C) October 2002 Small RNA in rice genome WANG Kai ( 1, ZHU Xiaopeng ( 2, ZHONG Lan ( 1,3 & CHEN Runsheng ( 1,2 1. Beijing Genomics Institute/Center of Genomics and

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Supplementary Information

Supplementary Information Supplementary Information Supplementary Figure 1. Schematic pipeline for single-cell genome assembly, cleaning and annotation. a. The assembly process was optimized to account for multiple cells putatively

More information

Bio 119 Bacterial Genomics 6/26/10

Bio 119 Bacterial Genomics 6/26/10 BACTERIAL GENOMICS Reading in BOM-12: Sec. 11.1 Genetic Map of the E. coli Chromosome p. 279 Sec. 13.2 Prokaryotic Genomes: Sizes and ORF Contents p. 344 Sec. 13.3 Prokaryotic Genomes: Bioinformatic Analysis

More information

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B Microbial Diversity and Assessment (II) Spring, 007 Guangyi Wang, Ph.D. POST03B guangyi@hawaii.edu http://www.soest.hawaii.edu/marinefungi/ocn403webpage.htm General introduction and overview Taxonomy [Greek

More information

Interpreting the Molecular Tree of Life: What Happened in Early Evolution? Norm Pace MCD Biology University of Colorado-Boulder

Interpreting the Molecular Tree of Life: What Happened in Early Evolution? Norm Pace MCD Biology University of Colorado-Boulder Interpreting the Molecular Tree of Life: What Happened in Early Evolution? Norm Pace MCD Biology University of Colorado-Boulder nrpace@colorado.edu Outline What is the Tree of Life? -- Historical Conceptually

More information

AP BIOLOGY SUMMER ASSIGNMENT

AP BIOLOGY SUMMER ASSIGNMENT AP BIOLOGY SUMMER ASSIGNMENT Welcome to EDHS Advanced Placement Biology! The attached summer assignment is required for all AP Biology students for the 2011-2012 school year. The assignment consists of

More information

I. Molecules and Cells: Cells are the structural and functional units of life; cellular processes are based on physical and chemical changes.

I. Molecules and Cells: Cells are the structural and functional units of life; cellular processes are based on physical and chemical changes. I. Molecules and Cells: Cells are the structural and functional units of life; cellular processes are based on physical and chemical changes. A. Chemistry of Life B. Cells 1. Water How do the unique chemical

More information

Map of AP-Aligned Bio-Rad Kits with Learning Objectives

Map of AP-Aligned Bio-Rad Kits with Learning Objectives Map of AP-Aligned Bio-Rad Kits with Learning Objectives Cover more than one AP Biology Big Idea with these AP-aligned Bio-Rad kits. Big Idea 1 Big Idea 2 Big Idea 3 Big Idea 4 ThINQ! pglo Transformation

More information

1. The basic structural and physiological unit of all living organisms is the A) aggregate. B) organelle. C) organism. D) membrane. E) cell.

1. The basic structural and physiological unit of all living organisms is the A) aggregate. B) organelle. C) organism. D) membrane. E) cell. Name: Date: Test File Questions 1. The basic structural and physiological unit of all living organisms is the A) aggregate. B) organelle. C) organism. D) membrane. E) cell. 2. A cell A) can be composed

More information

Supporting online material

Supporting online material Supporting online material Materials and Methods Target proteins All predicted ORFs in the E. coli genome (1) were downloaded from the Colibri data base (2) (http://genolist.pasteur.fr/colibri/). 737 proteins

More information

Biology. Slide 1 of 36. End Show. Copyright Pearson Prentice Hall

Biology. Slide 1 of 36. End Show. Copyright Pearson Prentice Hall Biology 1 of 36 2 of 36 Formation of Earth Formation of Earth Hypotheses about Earth s early history are based on a relatively small amount of evidence. Gaps and uncertainties make it likely that scientific

More information

Name: Class: Date: ID: A

Name: Class: Date: ID: A Class: _ Date: _ Ch 17 Practice test 1. A segment of DNA that stores genetic information is called a(n) a. amino acid. b. gene. c. protein. d. intron. 2. In which of the following processes does change

More information

MiGA: The Microbial Genome Atlas

MiGA: The Microbial Genome Atlas December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From

More information

NATS 104 LIFE ON EARTH SPRING, 2004 FIRST 100-pt EXAM. (each question 2 points)

NATS 104 LIFE ON EARTH SPRING, 2004 FIRST 100-pt EXAM. (each question 2 points) NATS 104 LIFE ON EARTH SPRING, 2004 FIRST 100-pt EXAM. (each question 2 points) Section: Name: Write your name and section on this page. On the bubble sheet write your name Last (space) First (space) M.I.

More information

Introduction to cells

Introduction to cells Almen Cellebiologi Introduction to cells 1. Unity and diversity of cells 2. Microscopes and visualization of cells 3. Prokaryotic cells, eubacteria and archaea 4. Eucaryotic cells, nucleus, mitochondria

More information

Introductory Microbiology Dr. Hala Al Daghistani

Introductory Microbiology Dr. Hala Al Daghistani Introductory Microbiology Dr. Hala Al Daghistani Why Study Microbes? Microbiology is the branch of biological sciences concerned with the study of the microbes. 1. Microbes and Man in Sickness and Health

More information

T H E J O U R N A L O F C E L L B I O L O G Y

T H E J O U R N A L O F C E L L B I O L O G Y T H E J O U R N A L O F C E L L B I O L O G Y Supplemental material Breker et al., http://www.jcb.org/cgi/content/full/jcb.201301120/dc1 Figure S1. Single-cell proteomics of stress responses. (a) Using

More information

Introduction to Molecular and Cell Biology

Introduction to Molecular and Cell Biology Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the molecular basis of disease? What

More information

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17. Genetic Variation: The genetic substrate for natural selection What about organisms that do not have sexual reproduction? Horizontal Gene Transfer Dr. Carol E. Lee, University of Wisconsin In prokaryotes:

More information

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA XIUFENG WAN xw6@cs.msstate.edu Department of Computer Science Box 9637 JOHN A. BOYLE jab@ra.msstate.edu Department of Biochemistry and Molecular Biology

More information

Unit 3: Cells. Objective: To be able to compare and contrast the differences between Prokaryotic and Eukaryotic Cells.

Unit 3: Cells. Objective: To be able to compare and contrast the differences between Prokaryotic and Eukaryotic Cells. Unit 3: Cells Objective: To be able to compare and contrast the differences between Prokaryotic and Eukaryotic Cells. The Cell Theory All living things are composed of cells (unicellular or multicellular).

More information

AST 205. Lecture 18. November 19, 2003 Microbes and the Origin of Life. Precept assignment for week of Dec 1

AST 205. Lecture 18. November 19, 2003 Microbes and the Origin of Life. Precept assignment for week of Dec 1 AST 205. Lecture 18. November 19, 2003 Microbes and the Origin of Life Context Definition of life Cells, the atoms of life Major classes & families of cells Origin/evolution of biochemistry of life Origin/evolution

More information

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology 2012 Univ. 1301 Aguilera Lecture Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species.

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species. Supplementary Figure 1 Icm/Dot secretion system region I in 41 Legionella species. Homologs of the effector-coding gene lega15 (orange) were found within Icm/Dot region I in 13 Legionella species. In four

More information

UNIT 3 CP BIOLOGY: Cell Structure

UNIT 3 CP BIOLOGY: Cell Structure UNIT 3 CP BIOLOGY: Cell Structure Page CP: CHAPTER 3, Sections 1-3; HN: CHAPTER 7, Sections 1-2 Standard B-2: The student will demonstrate an understanding of the structure and function of cells and their

More information

Expression of nuclearencoded. photosynthesis in sea slug (Elysia chlorotica)

Expression of nuclearencoded. photosynthesis in sea slug (Elysia chlorotica) Expression of nuclearencoded proteins for photosynthesis in sea slug (Elysia chlorotica) Timothy Youngblood Writer s Comment: This paper was written as an assignment for UWP 102B instructed by Jared Haynes.

More information

BIOINFORMATICS: An Introduction

BIOINFORMATICS: An Introduction BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and

More information

Microbial Diversity. Yuzhen Ye I609 Bioinformatics Seminar I (Spring 2010) School of Informatics and Computing Indiana University

Microbial Diversity. Yuzhen Ye I609 Bioinformatics Seminar I (Spring 2010) School of Informatics and Computing Indiana University Microbial Diversity Yuzhen Ye (yye@indiana.edu) I609 Bioinformatics Seminar I (Spring 2010) School of Informatics and Computing Indiana University Contents Microbial diversity Morphological, structural,

More information

Cell Organelles. a review of structure and function

Cell Organelles. a review of structure and function Cell Organelles a review of structure and function TEKS and Student Expectations (SE s) B.4 Science concepts. The student knows that cells are the basic structures of all living things with specialized

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Miller & Levine Biology 2014

Miller & Levine Biology 2014 A Correlation of Miller & Levine Biology To the Essential Standards for Biology High School Introduction This document demonstrates how meets the North Carolina Essential Standards for Biology, grades

More information

GACE Biology Assessment Test I (026) Curriculum Crosswalk

GACE Biology Assessment Test I (026) Curriculum Crosswalk Subarea I. Cell Biology: Cell Structure and Function (50%) Objective 1: Understands the basic biochemistry and metabolism of living organisms A. Understands the chemical structures and properties of biologically

More information

Tor Olafsson. evolution.berkeley.edu 1

Tor Olafsson. evolution.berkeley.edu 1 The Eukaryotic cell is a complex dynamic compartmentalised structure that originated through endosymbiotic events. Discuss this describing the structures of the eukaryotic cell, together with their functions,

More information

Biology 2180 Laboratory # 5 Name Plant Cell Fractionation

Biology 2180 Laboratory # 5 Name Plant Cell Fractionation Biology 2180 Laboratory # 5 Name Plant Cell Fractionation In this lab, you will work with plant tissue to learn about cell fractionation. Cell Fractionation is the process that isolates different components

More information

I. Molecules & Cells. A. Unit One: The Nature of Science. B. Unit Two: The Chemistry of Life. C. Unit Three: The Biology of the Cell.

I. Molecules & Cells. A. Unit One: The Nature of Science. B. Unit Two: The Chemistry of Life. C. Unit Three: The Biology of the Cell. I. Molecules & Cells A. Unit One: The Nature of Science a. How is the scientific method used to solve problems? b. What is the importance of controls? c. How does Darwin s theory of evolution illustrate

More information

Chapters 25 and 26. Searching for Homology. Phylogeny

Chapters 25 and 26. Searching for Homology. Phylogeny Chapters 25 and 26 The Origin of Life as we know it. Phylogeny traces evolutionary history of taxa Systematics- analyzes relationships (modern and past) of organisms Figure 25.1 A gallery of fossils The

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

Bacillus anthracis. Last Lecture: 1. Introduction 2. History 3. Koch s Postulates. 1. Prokaryote vs. Eukaryote 2. Classifying prokaryotes

Bacillus anthracis. Last Lecture: 1. Introduction 2. History 3. Koch s Postulates. 1. Prokaryote vs. Eukaryote 2. Classifying prokaryotes Last Lecture: Bacillus anthracis 1. Introduction 2. History 3. Koch s Postulates Today s Lecture: 1. Prokaryote vs. Eukaryote 2. Classifying prokaryotes 3. Phylogenetics I. Basic Cell structure: (Fig.

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

BIOLOGY STANDARDS BASED RUBRIC

BIOLOGY STANDARDS BASED RUBRIC BIOLOGY STANDARDS BASED RUBRIC STUDENTS WILL UNDERSTAND THAT THE FUNDAMENTAL PROCESSES OF ALL LIVING THINGS DEPEND ON A VARIETY OF SPECIALIZED CELL STRUCTURES AND CHEMICAL PROCESSES. First Semester Benchmarks:

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

CHAPTER 1 INTRODUCTION TO CELLS 2009 Garland Science Publishing 3 rd Edition

CHAPTER 1 INTRODUCTION TO CELLS 2009 Garland Science Publishing 3 rd Edition Unity and Diversity of Cells 1-1 The smallest unit of life is a(n) (a) DNA molecule. (b) cell. (c) organelle. (d) virus. (e) protein. CHAPTER 1 INTRODUCTION TO CELLS 2009 Garland Science Publishing 3 rd

More information

Biology Science Crosswalk

Biology Science Crosswalk SB1. Students will analyze the nature of the relationships between structures and functions in living cells. a. Explain the role of cell organelles for both prokaryotic and eukaryotic cells, including

More information

Study Guide: Fall Final Exam H O N O R S B I O L O G Y : U N I T S 1-5

Study Guide: Fall Final Exam H O N O R S B I O L O G Y : U N I T S 1-5 Study Guide: Fall Final Exam H O N O R S B I O L O G Y : U N I T S 1-5 Directions: The list below identifies topics, terms, and concepts that will be addressed on your Fall Final Exam. This list should

More information

Class IX: Biology Chapter 5: The fundamental unit of life. Chapter Notes. 1) In 1665, Robert Hooke first discovered and named the cells.

Class IX: Biology Chapter 5: The fundamental unit of life. Chapter Notes. 1) In 1665, Robert Hooke first discovered and named the cells. Class IX: Biology Chapter 5: The fundamental unit of life. Key learnings: Chapter Notes 1) In 1665, Robert Hooke first discovered and named the cells. 2) Cell is the structural and functional unit of all

More information