Supplementary Figure 1

Size: px
Start display at page:

Download "Supplementary Figure 1"

Transcription

1 Supplementary Figure Input Species Tree A Genomic Sequences X Y B Gene Order (optional) A A2 A3 C Pre-Processing A B A2 B2 A3 B3 Identify Orthogroups SYNERGY algorithm Output Y Y8 Y2 Y4 X X8 Bn C B A B8 B7 A8 C8 C9C2 B2 A2 B3 A3 C4 B4 A4 X2 X3 X4 Y Post-processing Volatility groups Extended Phylogenetic Profiles Paralogous Protein Pairs Analysis Protein- Interaction network Gene classes Functional data Regulatory modules

2 Supplementary Figure. A schemtic flow of the SYNERGY algorithm. SYNERGY receives as input a species tree, predicted protein sequences, the chromosomal order of genes, and the pair-wise distances between sequences (left). SYNERGY partitions the input genes to orthogroups (right, ovals), consisting of genes in the extant species (white circles) that are predicted to descend from a single ancestral gene in the last common ancestor of these extant species, and is also associated with a gene tree (inside each oval) that track the speciation, duplication (red star), and loss (blue strike) events in the genes lineage. 2

3 Kwal829 AGL38W KLLAE598g CAGLM942g YJR94C-2946 YJR94C-2543 YJR94C-265 IME Spom Anid Mgri Fgra NCra Ylip Calb Dhan Kwal Agos Klac Scas Cgla Sbay Smik Spar Scer OG #2268 Supplementary Figure 2 OG #38 b a RPS9 AN46.2 MG526.4 NCU FG2588. YALIB254g orf DEHAF7963g Kwal5326 AGR58C KLLAA794g Scas598.2 YNL32C-26 YNL32C-7753 YNL32C-89 RPS9B Scas CAGLE99g YOL2C-9863 YOL2C-9863 YOL2C-9863 RPS9A Spom Anid Mgri Fgra NCra Ylip Calb Dhan Kwal Agos Klac Scas Cgla Sbay Smik Spar Scer Scas Cgla Sbay Smik Spar Scer RPS92 Spom ECVP ECVP EPP EPP -

4 Supplementary Figure 2. Additional examples from the Ascomycota gene ancestry catalog. (a) Orthogroup #2268, an appearing orthogroup that contains the S. cerevisiae gene IME. The gene tree topology (left, black lines) differs from that of the species tree (right) as it is only supported by sequences from the clade spanning K. waltii and S. cerevisiae. The extended phylogenetic profile (EPP, center, numbered boxes) shows the gene copy number for all the species. The copy number variation profile (ECVP, right, numbered boxes) indicates the changes in gene copy number. An increase (+) is placed at the first ancestral species where this orthogroup is traced to ( appears in). The S. castellii orthologue in this orthogroup was lost (-, blue strike). (b) Orthogroup #38, a persistent (but not uniform) orthogroup that contains the S. cerevisiae genes RPS9A and RPS9B, which encode proteins of the small ribosomal subunit. The gene tree topology (left) differs from that of the species tree (center, right) as it includes both duplication (red star) and loss (blue strike) events. EPP and ECVP for the orthogroup are show the center and right panels, respectively. The ECVP indicates an increase in copy number (+) due to gene duplication at the WGD and along the branch leading to S. pombe (red stars). One of the WGD paralogues was subsequently lost in C. glabrata (-, blue strike). Despite the loss event, the orthogroup contains at least one member gene in each extant species. 4

5 Supplementary Figure 3 a # appearances # losses # duplication # # ancestral genes S. cerevisiae S. paradoxus S. mikatae S. bayanus C. glabrata S. castellii K. lactis A. gossypii K. waltii D. hansenii C. albicans Y. lipolytica N. crassa F. gramin. M. grisea A. nidulans S. pombe 5

6 Supplementary Figure 3 b S. cerevisiae S. paradoxus S. mikatae S. bayanus C. glabrata S. castellii K. lactis A. gossypii K. waltii D. hansenii C. albicans Y. lipolytica N. crassa F. gramin. M. grisea A. nidulans 4354 S. pombe Total # of appearances # Appearing orthogroups homologous to Cryptococus neoformans genes # # Ancestral genes 6

7 Supplementary Figure 3 c S. cerevisiae S. paradoxus S. mikatae S. bayanus C. glabrata S. castellii K. lactis A. gossypii K. waltii D. hansenii C. albicans Y. lipolytica N. crassa F. gramin. M. grisea A. nidulans 4354 S. pombe Total # of appearances # Appearing orthogroups homologous to other orthogroups # # Ancestral genes 7

8 Supplementary Figure 3 d S. cerevisiae S. paradoxus S. mikatae S. bayanus C. glabrata S. castellii K. lactis A. gossypii K. waltii D. hansenii C. albicans Y. lipolytica N. crassa F. gramin. M. grisea A. nidulans 4354 S. pombe Total # of appearances # Appearing orthogroups homologous to either other orthogroups or C. neoformans genes # # ancestral genes 8

9 Supplementary Figure 3. Summary of reconstructed evolutionary events and gene counts in the Ascomycota fungi. Shown is the species tree where each species is annotated (rectangles) with the number of known genes (for extant species) or inferred genes (for ancestral species). (a) Global statistics: Numbers on each branch denote the numbers of appearing (green), duplicated (red), and lost (blue) genes in the species from which the branch originated (see Figure 2f in main text for relative values). Note the large number of duplications in the WGD branch, the large number of losses in post-wgd branches, and the large numbers of appearances and losses in the Euascomycota. The large numbers of duplications in S. mikatae, S. paradoxus, S. bayanus mostly stem from split open reading frame predictions (Supplementary Note 2). (b-d) Appearing orthogroups and their homologies to Cryptococcus neoformans genes and other orthogroups. Numbers on each branch denote the number of orthogroups appearing in this branch (green), and how many of those are significantly (E <.) homologous to genes in C. neoformans (b), in other orthogroups (c), or in either (d). 9

10 a Supplemental Figure 4 b c

11 d - stress_branch -2 development -3 development -4 flocculation -5 cell_cycle_cell_wall -6 puf2_puf4-7 DNA_replication -8 cell_wall -9 cell_wall_rlm - cell_wall_ace2_swi5 - ACE2_SWI5-2 cell_separation -3 mating_filamentous_growth -4 filamentous_growth -5 mating_filamentous_growth_signaling -6 mating -7 karyogamy -8 stress_carbohydrate -9 drug_transporters -2 mitochondrial_biogenesis -2 mitochondrial_ribosome -22 response_to_metal_ions_cad -23 PP -24 transport -25 metal_ion_transport -26 amino_acid_drug_nucleotide_transport -27 MDRs -28 AA_transport -29 Ras_Starvation -3 glycogen_nitrogen_regulation -3 autophagy_vacuole -32 sugar_transport_vitamin_nad_metabolism_yap6_phd_nrg -33 vitamin_nad_metabolism -34 CIN5_YAP6_PHD_NRG_SOK2-35 sugar_transport -36 stress_carbohydrate_respiration

12 -37 stress_carbohydrate -38 PPP_NADP -39 ROS -4 reserve_carbohydrate_trehalose -4 proteasome_protein_catabolism -42 Proteasome_RPN4-43 unfolded_protein_response_hsf -44 glycolysis_gluconeogenesis_gcr_gcr2-45 stress -46 SKN7_NRG -47 response_to_stress -48 glutathione -49 redox_detox -5 glutathione_metabolism -5 detox -52 detox_enzymes -53 YAP_YAP7_CAD_H2O2-54 redox -55 MSN24_H2O2-56 redox_homeostasis -57 carbohydrate_energy_reserves_msn2_hsf -58 trehalose -59 energy_reserves -6 glucose_metabolism -6 aerobic_metabolism -62 vacuole_lysis -63 ion_transport -64 aerobic_metabolism -65 gluconeogenesis -66 peroxisome_fa_oxidation -67 glycogen_glucose_repression -68 respiration -69 glyoxylate_cycle -7 TCA_cycle -7 oxidative_phosphorylation_hap4-72 ATP_synthase

13 2- growth_branch 2-2 PHO_vacuole 2-3 PHO 2-4 vacuole 2-5 phospholipid_metabolism_ino2_ino4 2-6 cell_cycle_meiosis 2-7 cyclins_cdks 2-8 FKH2_NDD 2-9 silencing 2- cohesin 2- DNA_replication_Cell_cycle_tx_regulation 2-2 NDD_FKH2_MCM 2-3 G_S_DNA_replication 2-4 histones_chromosome_organization 2-5 DNA_replication_DNA_repair 2-6 cell_cycle_tfs_cdks 2-7 G_S_TFs 2-8 CLN_CLB 2-9 mitosis_meiosis 2-2 microtubuli 2-2 mitotic_spindle 2-22 meiotic_recombination 2-23 sporulation 2-24 meiotic_mitotic_division 2-25 M_phase 2-26 sumoylation_ubiquitination 2-27 growth_aa_nitrogen_metabolism 2-28 AA_nitrogen_metabolism 2-29 serine_glycine_catabolism 2-3 CBF 2-3 nitrogen_metabolism 2-32 allantoin_dal_gat 2-33 nitrogen_starvation_asp 2-34 AA_metabolism 2-35 leu_val_ile_metabolism 2-36 purine_metabolism_bas

14 2-37 tryptophane_metabolism 2-38 glycine_serine_biosynthesis 2-39 methionine_sulfur_biosynthesis_met32_met4 2-4 alanine_aspartate_metabolism 2-4 aromatic_aa_biosynthesis 2-42 histidine_metabolism 2-43 aspartate_lysine_metabolism 2-44 general_aa 2-45 amine_metabolism 2-46 GLN glutamine_metabolism 2-48 ornithine_metabolism 2-49 general_aa_gcn4 2-5 arginine_metabolism_urea_cycle 2-5 growth 2-52 PPP_FA_biosynthesis 2-53 PPP 2-54 FA_biosynthesis 2-55 sterol_lipid_biosynthesis_hap 2-56 pyrimidine_rcs 2-57 RNase 2-58 histones_puf5_hir AA_metabolism_protein_glycosylation 2-6 trp_thr_pyrimidine_biosynthesis 2-6 protein_glycosylation 2-62 cell_organization 2-63 morphogenesis 2-64 septin_ring 2-65 bud 2-66 actin_patch 2-67 bud_site_selection 2-68 nuclear_transport 2-69 vesicle_secretion_er_golgi 2-7 Golgi 2-7 ER_protein_modification_targeting 2-72 protein_biosynthesis

15 2-73 exosome 2-74 a_purine_pyrimidine_bas 2-75 translation_elongation_regulation 2-76 nuclear_export 2-77 Pol_III_targets 2-78 ribosme_ribosome_biogenesis 2-79 preribosome 2-8 trna_aminoacylation 2-8 ribosome_biogenesis 2-82 trna_rrna_modification 2-83 trna_modification 2-84 rrna_metabolism 2-85 Pol_I_Pol_III 2-86 rrna_biogenesis_puf ribosome_and_associated_proteins 2-88 translation_factors 2-89 ribosome_rap_sfp_fhl 2-9 RAP_SFP 2-9 FHL 2-92 ribosome 5

16 Supplementary Figure 4. A hierarchical modular organization of the yeast transcriptional system. (a) A functional hierarchy of S. cerevisiae transcriptional modules. By combining a large collection of functional gene sets and a compendium of ~,5 S. cerevisiae gene expression profiles (see Methods) we constructed a hierarchy of modules (nodes in tree), each consisting of genes with coherent function and expression. The tree is divided to two major groups: a Stress group (left) and a Growth group (right). A color coded representation of the top 5 branches is shown in Fig. 4 in the main text. (b-d) Stress and Growth groups. A detailed representation of the modules in the Stress (b) and Growth (c) groups is shown. The biologically meaningful name corresponding to each module s code is given in panel (d). 6

17 Supplementary Figure 5 a trna spliceosome (MIPS) OG3388 OG2862 OG2863 OG297 OG3725 OG57 OG396 OG3672 OG33634 Gain/Loss Coherence (P=.223): Genes SEN34 STP3 STP4 STP STP2 TRL LOS SEN2 SEN5 TPT TRL - b Protein Biosynthesis (GO; 795 orthogroups; P< -4 ) e Cell Wall Organization and Biogenesis (GO; 28 orthogroups; P=.9694) c Mitosis (GO; 4 orthogroups; P=.59) d 2S Proteasome (MIPS; 5 orthogroups; P< -4 ) 7

18 Supplementary Figure 5. Coherent evolution of functionally related proteins. (a) Phylogenetic coherence of the trna spliceosome orthogroup class. The set of S. cerevisiae trna spliceosome genes (MIPS) was mapped to the set of orthogroups that contain these genes (black arrows, middle panel). Some orthogroups (e.g. #2862) contain multiple paralogues from the gene set. The ECVPs of all the orthogroups in the set are compiled into a matrix (left panel). Each row denotes one profile, and each column the copy number changes in one species (red - increase, blue - decrease, black - no change). The species (extant and ancestral) are ordered according to the order of the nodes in the species tree (top). The bottom row shows the coherence score for each column (purple - coherent), as evaluated by comparing the number of events to the distribution of events in the specific node within a random set of orthogroups of the same size (see Supplementary Methods). The overall significance of the coherence is reported in a P-value. (b-e) Phylogenetic coherence of the Protein biosynthesis, Mitosis, 2S Proteasome, and Cell wall organization and biogenesis orthogroup sets. Copy number variation coherence is presented as in (a). The copy number changes observed in the Protein biosynthesis (b), Mitosis (c), and 2S Proteasome (d) orthogroup classes are coherent. Those in the Cell wall organization and biogenesis orthogroup set are not coherent (e). Orthogroup classes are projected from the GO gene classes. 8

19 Supplementary Figure 6 a 7.5 Average # differences between ECVPs % Random Expected 5% Random % Random Observed Distance in biochemical interaction network b 7.5 Average # differences between ECVPs % Random Expected 5% Random % Random Observed Distance in genetic interaction network 9

20 Supplementary Figure 6. Coherent evolution of interacting proteins. Distance of genes in biochemical (a) and genetic (b) interaction networks (x-axis) is plotted versus the average difference between the ECVPs of pairs of genes of that distance or less in each network (y-axis). Pairs of paralogous genes are excluded from the computation of the averages. Black lines show the %, 5%, and 5% of the distribution of average distances from repetitions of this computation in networks with the same topology obtained by random reshuffling of the gene to profile associations (see Supplementary Methods). In both cases, similarity in ECVPs inversely scales with distance in the interaction network. Each network combines literature-curated results and highthroughput measurements. Similar results are obtained when only one source of data is used (data not shown). 2

21 a Supplementary Figure 7 in gene class Rate of paralogous gene pair gene class retention GO Function b GO Process # Orthogroups with paralogs 2 3 > % -% -2% 2-3% 3-4% 4-5% 5-6% 6-7% 7-8% 8-9% 9-% % % -% -2% 2-3% 3-4% 4-5% 5-6% 6-7% 7-8% 8-9% 9-% % c GO Component d Transcription Modules % -% -2% 2-3% 3-4% 4-5% 5-6% 6-7% 7-8% 8-9% 9-% % % -% -2% 2-3% 3-4% 4-5% 5-6% 6-7% 7-8% 8-9% 9-% % e MIPS f DNA-TF % -% -2% 2-3% 3-4% 4-5% 5-6% 6-7% 7-8% 8-9% 9-% % % -% -2% 2-3% 3-4% 4-5% 5-6% 6-7% 7-8% 8-9% 9-% % g RNA-PUF h Motifs % -% -2% 2-3% 3-4% 4-5% 5-6% 6-7% 7-8% 8-9% 9-% % % -% -2% 2-3% 3-4% 4-5% 5-6% 6-7% 7-8% 8-9% 9-% % 2

22 Supplementary Figure 7. Distribution of paralogue retention rates in various classes and modules. Each panel shows the distribution of paralogue retention rates (in % bins) in particular types of functional classes or modules. At each bin, we distinguish the classes based on the number of orthogroups with paralogs per class or module. Note that if a category has only a single orthogroup with a paralogue, that paralogue could be either retained, placing the category in the % bin, or not, placing it in the % bin. (a) GO Biological Function (b) GO Molecular Process (c) GO Cellular Component (d) Transcription modules (e) MIPS complexes (f) Targets of transcription factors (g) Targets of RNA binding (PUF) proteins (h) Targets of cisregulatory elements. 22

23 Supplementary Figure 8 a b c % Paralogous relations retained Number of paralogous relations Paralog gene set retention rates J I H G WGD E D all Number of paralogous relations across species tree Species Tree Indices Hemiascomycota J J I H G WGD E D all Species tree index I H S. pombe G WGD E N. crassa S. cerevisiae S. paradoxus S. mikatae S. bayanus C. glabrata S. castellii K. lactis A. gossypii K. waltii D. hansenii Y. lipolytica F. graminearum M. grisea A. nidulans D C. albicans Euascomycota Archeascomycota GO Process GO Component GO Function Transcription modules

24 Supplementary Figure 8. Retention rates of paralogous gene pairs in GO functional categories and transcriptional modules. Shown are the percentages of paralogous pairs that have not migrated (were retained) in the hierarchies of GO Biological Process (red bars), GO Molecular Function (orange bars), GO Cellular Component (yellow bars), and the transcription modules (green bars) (a). The retention rates are shown for all paralogues ( all ) and for S. cerevisiae paralogs duplicated at different points in the tree ( D-J, WGD ). Only paralogue pairs where both paralogs had an assigned annotation were considered in each case. The number of paralogue pairs assessed in each case is shown in (b). The phylogenetic indices D-J are shown on the species tree in (c). 24

25 Supplementary Figure 9 a) Paralog migration between GO process gene classes Gene classes Gene classes 5 # Genes in class # Orthogroups with paralogs 25

26 Supplementary Figure 9 b) Paralog migration between GO function gene classes Gene classes Gene classes 5 # Genes in class # Orthogroups with paralogs 26

27 Supplementary Figure 9 c) Paralog migration between GO component gene classes Gene classes Gene classes 5 # Genes in class # Orthogroups with paralogs 27

28 Supplementary Figure 9 ) Paralog migration between transcription factor target gene classes Gene classes Gene classes 5 # Genes in class # Orthogroups with paralogs 28

29 Supplementary Figure 9 e) Paralog migration between cis-regulatory element target gene classes Gene classes Gene classes 5 # Genes in class # Orthogroups with paralogs 29

30 Supplementary Figure 9 f) Paralog migration between transcription modules Development Stress & carbohydrate metabolism Cell cycle & meiosis Amino acid & nitrogen metabolism Fundamental processes Transcription Modules Transcription Modules 5 # Genes in module # Orthogroups with paralogs 3

31 Supplementary Figure 9: Heat maps of paralogue retention and migration across gene classes and modules. The rows and columns of each matrix are classes or modules. In each matrix, the same row and column ordering is used. For the transcription modules, this order also follows the order of the module hierarchy. Each entry on the diagonal indicates the number of paralogous pairs where both paralogues are retained within the module. Off-diagonal entries indicate the number of paralogous gene relations bridging the gene classes or modules. The right panel indicates the size of the class or module and the number of orthogroups with paralogues within it (gray and orange bars, respectively). (a-c) GO Biological Process (a) GO Molecular Function (b) and GO Cellular Component (c) all have high retention rates (strong diagonal entries vs weak offdiagonal). (d-e) Targets of transcription factors (d) and targets of the same cis-regulatory element (e) all have low retention rates (weak diagonal entries). Transcription modules (f) show an intermediate pattern. The five main branches of the module hierarchy are marked. Transcriptional modules do not exhibit high migration rates between modules in the fundamental process and AA metabolism branches (yellow rectangle). In all cases, we do not observe the formation of clear paralogous modules, as only few paralogues at most connect any pair of classes or modules. A quarter of the paralogue migrations occur across the growth and stress groups, consistent with the high migration rates of paralogous genes under the same regulatory mechanisms. This may also explain the observed partitioning in co-expression following the WGD but is not unique to that event (data not shown). While many additional migrations occur between different modules within the stress group of modules, some branches of the module hierarchy exhibit very few migrations within them. In particular, there are few paralogous relations between different modules in the fundamental processes branch, suggesting that paralogue migrations among these modules tend to result in a drastically different transcriptional program. 3

32 a Supplementary Figure Biochemical Interaction Network Distribution of Fractions of Shared Interacting Part e Genetic Interaction Network Distribution of Fractions of Shared Interacting Parters Fraction Shared Interaction Fraction Shared Interactions b Fraction shared between pairs of paralogues Fraction shared between random pa f Fraction shared betwen pairs of paralogues Fraction shared between random pairs Distribution of Conserved Interacting Partn Significance Distribution of Conserved Interacting Partners Significance P-values c Conservation Significance Between Paralogues Conservation Significance Between Random Pa g P-values Conservation Significance Between Paralogues Conservation Significance Between Random Pairs Distribution of Subfunctionalization Indices Distribution o Subfunctionalization Indices Index of subfunctionalization Index of Subfunctionalization Subfunctionalization between pairs of paralogues Subfunctionalization between random pairs Subfunctionalization between pairs of paralogues Subfunctionalization between random pairs d Distribution of Conserved Interacting Partn Significance h Distribution of Conserved Interacting Partners Significance P-values P-values Conservation Significance Paralogous Neighborhoods Conservation Significance of Random Pa Conservation Significance Between Paralogues Conservation Significance Between Random Pairs 32

33 Supplementary figure : Conservation and divergence in biochemical and genetic interaction networks. In all analyses only pairs with both paralogues appearing in the interaction network were considered. (a,e) Distribution of fraction of shared interaction partners between pairs of paralogues (blue). Approximately half (a third) the paralogues have no shared interactions in the biochemical (genetic) network. The expected distribution (based on random pairs) is shown in red. (b,f) Distribution of p-values estimating the significance of the fraction of shared interaction partners between pairs of paralogues (blue). Almost half (46% in biochemical, 48% in genetic networks) of the paralogue pairs have significantly (P<.5) conserved sets of partners, much more than expected for random pairs (red) (c,g) Distribution of subfunctionalization indices for pairs of paralogues (blue), as computed by a previously proposed measure. No significant difference from the expected distribution (based on random pairs) is observed, with the notable exception of paralogous pairs with no divergence (a zero index). (d,h) Distribution of p-values estimating the significance of the subfunctionalization indices between pairs of paralogues (blue). Approximately 37% of the paralogous pairs have significantly (P<.5) conserved sets of partners in both biochemical and genetic networks, much more than expected for random pairs (red). 33

34 Supplementary Figure a Conservation Imd3 Srp Gbp2 Nop6 Npl3 Trm Sub2 Thp2 Gal Dbp2 Rps5 Imd3 Yra Tex Pma Hyp2 Rlr Hpr Mft Hmt Air Mtr Hrb Bit2 Eaf6 Gbp2/Hrb (Biochemical, P cons <.) b Snt Sif2 Hos4 Set3 Hst Cpr Divergence Hos2 Rfm Sum Bgl2 Mcd Tdh3 Rpa 9 Fob Net Hst/Sir2 (Biochemical) Cdc4 Hsp Adh 82 Sir4 Esc8 Yef3 Dot Sir2 Esc2 Sir Mcm Zds Sir3 Hhf Hht Zds2 Hhf2 Hht2 Rif2 c Split Bud4 Mss Mep Flo8 Phd Hsp 82 Msn2 Clb Sok2 Tpk2 Egt2 Swi5 Ash Phd/Sok2 (Genetic) 34

35 Supplementary Figure : Functional conservation and innovation of paralogues in classes and networks. (a-c) Following duplication, approximately half of the paralogue pairs significantly conserve their interaction partners (a), whereas the rest have no shared partners (b), although they may directly interact with each other (c), leading to modularization. In each network, node color indicates a distinct biological process. 35

36 Supplementary Figure 2 a oxidized Trr reduced oxidized Trr reduced oxidized Trr Trx NADPH, H + reduced Trx Trx/2 NADPH, H + Cytoplasm oxidized Trr Trx/2 reduced Trx/2 NADPH, H + Cytoplasm oxidized Trr2 Trx/2 reduced Trx3 NADPH, H + Trx3 Trx3 NADPH, H + Trx3 Mitochondrion Mitochondrion b oxidized Trr reduced oxidized Trr Trx NADPH, H + reduced Trx Trx/2 oxidized NADPH, H + Cytoplasm Trr Trx/2 reduced Trx3 NADPH, H + Mitochondrion Trx3 oxidized Trr Trx/2 NADPH, H + reduced Trx/2 Cytoplasm oxidized Trr2 reduced Trx3 NADPH, H + Trx3 Mitochondrion 36

37 Supplementary Figure 2: Formation of paralogous thioredoxin modules. (a) Shown is the reconstructed formation of two paralogous thioredoxin systems 64 one in the cytoplasm (right, top) and one in the mitochondrion (right, bottom) from a single ancestral system (left). This happened by two independent duplication events. The first event (center) occurred at the last common ancestor of the Hemiascomycota fungi, where a single thioredoxin ancestral gene (pink circle, left) was duplicated, with one paralogue specializing to the cytoplasm (Trx/2) and the other to the mitochondrion (Trx3). At this point, both systems shared a single thioredoxin reductase gene (Trr, orange oval). In the second step (right), taking place along with the WGD, the thioredoxin reductase was duplicated as well, resulting in a divergent pair of cytoplasmic and mitochondrial thioredoxin reductase genes (Trr and Trr2, orange ovals), and two distinct paralogous modules.(b) The progression of evolutionary events of this module are indicated along the species phylogeny. Duplications are indicates by red stars along the branches of the tree. 37

38 doi:.38/nature67 SUPPLEMENTARY INFORMATION Supplementary Note : Quality of genomic data sources We found our genomic data sources to be of varying quality. Our results reported three phenomena that suggest possible faults in open reading frame (ORF) predictions from these data: () many genes were not associated with any orthologue among all the other species in our data ( singletons ), (2) many adjacent ORFs are likely to have been fragments of a single ORF which was divided into multiple segments by automated gene prediction programs ( segmented ORFs ), and (3) a substantial number of likely open reading frames were not among the annotated sets of genes of the published genomes. We address these three separately below, and assert that SYNERGY is able to refine predicted gene complements by attempting to place them into orthogroups, removing singletons, merging segmented ORFs, and recovering likely un-predicted genes. () Singletons We identified the following number of predicted ORFs for which no orthology relations were determined by our algorithm: Species # Singletons % Total predicted ORFs # Non-Singleton ORFs S. cerevisiae 6 2.8% 5624 S. paradoxus % 5955 S. mikatae % 6262 S. bayanus % 68 C. glabrata % 4969 S. castellii 4 2.5% 5455 K. lactis % 498 A. gossypii 95 2.% 463 K. waltii % 4946 D. hansenii % 5953 C. albicans % 5778 Y. lipolytica % 567 N. crassa % 7377 F. graminearum % 927 M. grisea % 7867 A. nidulans % 7568 S. pombe % 4354 Total % 2,

39 doi:.38/nature67 SUPPLEMENTARY INFORMATION While we acknowledge that some singleton orthogroups consist of genes with true orthologues in other species (false negatives), we believe that a significant portion of these singletons are cases of incorrect open reading frame predictions, given our low threshold for considering sequences significantly similar (E<.; Supplementary Note ). We note that that while the number of open reading frames predicted for each of these species spans a range of more than 2-fold, from 4,723 (A. gossypii) to,64 (F. graminearum), this range is reduced for the number of open reading frames belonging to non-singleton orthogroups. Furthermore, only 2.8% of the genes in the well-annotated S. cerevisiae genome are singletons. The genomes in red font (S. paradoxus, S. mikatae, and S. bayanus) were all obtained from the same source and were simultaneously published 2. Our inspections of these sequences suggest that many ORFs are dubious predictions and hence appear as singletons in our data. For example, many predicted ORFs code for only one amino acid (e.g. ORFs 237, YNL334C-33, and YMR67W-892 in S. bayanus), and many more are very short, dubious peptide sequences. In addition, the abundance of singleton ORFs among the Euascomycota (green font) suggests a high degree of genome expansion and innovation among these species, and is also reflected in the number of appearing orthogroups in this clade of species. This phenomenon has been previously reported within this clade 6. Taken together, the singletons for these seven species account for over 72% of all singletons found. (2) Segmented Open Reading Frames Our results revealed a large amount of local duplications (occurring in only a single species) along the branches of the species tree leading to S. paradoxus (28), S. mikatae (865), and S. bayanus (49). As noted above, we believe that the ORF predictions published for these species may contain significant faults. Our inspections of these duplicate genes found that many come from short, adjacent ORFs along contigs, which together span the length of a single ORF in other closely related species (e.g. S. cerevisiae). For example, the ORFs YLR223C-4993 and YLR223C-4989 in S. mikatae are likely to be two mis-predicted partial ORFs. Indeed, the SYNERGY algorithm shows these and others like it as resulting from a local duplication event (Orthogroup #982, Fig 2d in main text). This observation explains the relatively large 39

40 doi:.38/nature67 SUPPLEMENTARY INFORMATION number of predicted ORFs in these species; much greater than those published in the model organism S. cerevisiae, which belongs to the same sensu stricto Saccharomyces clade. (3) Missed open reading frames We can potentially recover un-annotated genes from sequenced genomes by employing orthogroups exhibiting gene loss events at the leaves of the species tree and using their constituent genes to search for likely open reading frames within the chromosomes (or contigs) of the species pertaining to the loss. We used considerably restrictive criteria for identifying intergenic locations that potentially contain a coding region: First, we used TBLASTN to query the respective genomes for regions that are significantly similar to an orthogroups members. If such a region was identified whose E-value was below - and whose alignment spanned at least 8% of the query gene s length, we considered this a missed annotation. The numbers of regions satisfying these criteria per species are summarized below, and we include these predictions in our website. We found many such missed annotations to belong to relatively short genes, potentially explaining why automated gene prediction programs would not have identified them (e.g. many short ribosomal proteins are absent from the gene annotations of C. glabrata and K. waltii). Species # Losses # Likely recovered ORFs % of losses recovered S. cerevisiae % S. paradoxus % S. mikatae % S. bayanus % C. glabrata % S. castellii % K. lactis % A. gossypii % K. waltii % D. hansenii % C. albicans % Y. lipolytica % N. crassa % A. nidulans % Total 9,95 92 % 4

41 doi:.38/nature67 SUPPLEMENTARY INFORMATION We do not have these results for F. graminearum or M. grisea because their chromosome sequences are not available. Because we did not validate this procedure for finding potentially omitted ORF annotations with experimental data, we chose to exclude these recovered ORFs from our analysis. We did confirm that the addition of these ORFs would not have affected our analysis of persistent or uniform gene classes. Taken together, these results suggest that SYNERGY provides an important functionality (complementary to its original goal) as a comprehensive approach to improve ORF predictions for newly sequenced genomes. 4

42 doi:.38/nature67 SUPPLEMENTARY INFORMATION Supplementary Note 2: Measuring Orthogroup Confidence In order to assess SYNERGY s robustness to variations in the input data, we employed two measures on our resulting orthogroups assignments when faced with two kinds of perturbations. We describe below a bootstrap-based confidence measure that gauges SYNERGY s consistency when removing portions of the data used to construct orthogroups. Performance Summary The non-singleton orthogroups SYNERGY obtained are remarkably robust to a systematic perturbation of the set of included species (Figure 93.5% are complete and 99.7% are sound at 8% confidence level). The orthogroups are also highly robust to the gene content. When removing up to 2% of the genes in each genome at random, 96.3% of the orthogroups are complete, and 78% are sound (this latter value indicates that some orthogroups were joined into a single orthogroup following the perturbation, somewhat reducing accuracy, as expected). 2 Measuring Orthogroup Confidence To empirically measure SYNERGY s sensitivity to the specifics of a given dataset, we developed a bootstrap-based approach. By repeatedly excluding different portions of the data, we measure orthogroup robustness to () the choice of species included and (2) the accuracy of gene predictions within each species. We estimate a species confidence score by systematically hiding each branch of the species tree T and running SYNERGY separately on both subtree partitions that resulted from removing that branch. We calculate a gene 42

43 doi:.38/nature67 SUPPLEMENTARY INFORMATION Branch-Holdout Soundness Branch-Holdout Completeness Gene-Holdout Soundness Gene-Holdout Completeness # Orthogroups Bootstrap Confidence Value.2. Figure : Shown are the distributions of four robustness measures across,3 non-singleton orthogroups. To asses orthogroup robustness, we applied SYNERGY to different subsets of the organisms (branch-holdout, orange and yellow plots), and genes (gene-holdout, blue and purple plots). We then examine for each orthogroup how many of the orthologous pairs of genes in the orthogroup were identically identified under each perturbation (soundness, yellow and blue plots) and how many wrong new relations were added to the orthogroup (completeness, orange and purple plots). The majority of the orthogroups are robust to systematic perturbations in both species content and gene content. The orthogroups are most sensitive by the gene-soundness measures. This is due to the merger of several orthogroups together in the perturbed conditions. confidence score by randomly withholding a proportion of genes from each genome repeatedly, ensuring that every gene is held out at least once with a very high probability. For both species and gene confidence, we wish to test the soundness and completeness of the identified orthogroups. Recall that a complete orthogroup contains all genes that descended from a single common ancestor and thus its genes should not migrate out of it in the holdout experiments. To test this, we count the number of orthologous gene pairs (g j, g k ) in an orthogroup OG i that remained orthologous in a holdout experiment. We compute η c i for each orthogroup OG i as the fraction of its orthology assignments that remained We must account for the fact that some assignments are bound to change when genes within an orthogroup are among those hidden

44 doi:.38/nature67 SUPPLEMENTARY INFORMATION constant across our set of experiments H: η c i = {(g j, g k ) OG i h (g j, g k ) = OG i (g j, g k )} H (( OG i ( OG i )) /2) () where h(g j, g k ) and OG i (g j, g k ) specify the precise point in the species tree with respect to which g j and g k are orthologous (this is equal to - if g j and g k are not members of the same orthogroup). A sound orthogroup contains only the genes that descended from a single common ancestor, and thus new genes should not migrate into the orthogroup in the holdout experiments. We use a similar formula to obtain ηi s, except we count the number of pairs of non-orthologous genes (g j, g k ), g j OG i, g k OG i that became orthologous in the holdout conditions H. Since pairs of genes that share no protein sequence similarity are highly unlikely to be considered orthologous in H, we restrict our tests to gene pairs that can be loosely regarded as similar (E <.), rendering this task computationally feasible. We calculate an empirical bootstrap confidence value ηi s for orthogroup OG i as: η s i = {(g j, g k ) OG i h (g j, g k ) OG i (g j, g k )} H (g j OG i, g k OG i ) (2) Our confidence measures, ηi c and ηi s, can be computed for both speciesand gene-holdout experiments, giving us four robustness measures for each orthogroup

45 doi:.38/nature67 SUPPLEMENTARY INFORMATION Supplementary Note 3: Assessing fungal orthogroup accuracy In order to estimate the accuracy of our orthology assignments, we benchmarked SYNERGY s assignments using two previously published curated orthology annotations for a subset of the species included in our analyses, as well as with data generated through simulated evolutionary processes that included amino acid substitutions as well as gene duplication and losses. We report a summary of these tests here, in terms of specificity and sensitivity. Importantly, while we hope to compare our results with those of manually annotated gold standards, such annotations are themselves putative, and may contain either missing or erroneous assignments within them. We discuss the potential limitations of each specific dataset below. Comparison to the Yeast Gene Order Browser The Yeast Gene Order Browser (YGOB) is an online tool for visualizing orthologous loci for a collection of 6 pre- and post-whole genome duplication species 33. Its orthology assignments depend heavily on the chromosomal positioning of genes. Overall, this curated resource contains both orthology and paralogy assignments that are highly specific and supported by strong evidence. However, it is also limited by its scope, which assumes that orthology relations are at most -to-2 relations, and the WGD is the only duplication event it can capture. It does not report orthology assignments that are not supported by their syntenic context. YGOB reports its results as ancestral loci. Each ancestral locus pillar contains all the orthologues that descend from a single gene in the last common ancestor, and distinguishes clearly the descendants of the WGD duplicate copies. SYNERGY captured 88%-97% of the orthology assignments made by YGOB (Figure, orange entries) and 88-9% of the paralogy assignments (Figure, blue entries). As for specificity, 8-96% of the orthology assignments made by SYNERGY were also covered by YGOB. This somewhat lower specificity in SYNERGY s reported orthology relations is largely due to the limitations in YGOB s scope described above. In particular, YGOB does not consider 45

46 doi:.38/nature67 SUPPLEMENTARY INFORMATION any additional duplications besides the WGD, although such local duplications have certainly occurred. Figure : Comparison of SYNERGY s automated orthology assignments to those of the YGOB curated database. Orange off-diagonal cells indicate the percent of YGOB orthologues that were also captured by SYNERGY for each pair of species used by YGOB and in our analyses (sensitivity). The percent of SYNERGY s assignments also consistent with YGOB are reported in the green cells (specificity). The blue diagonal cells show the percent of YGOB samespecies paralogue relations that were also detected by SYNERGY. Comparison to manual curation of PombeDB The curators of the online Schizoscccharomyces pombe genome annotation database 65 have recently provided their users with a stable set of orthology assignments between S. pombe and S. cerevisiae open reading frames. These assignments were constructed over the course of six years by manual inspection of alignments by the curators. These curators assigned S. cerevisiae orthologues for 3,72 S. pombe gene products (including one-to-many and many-to-many relations). They also claim that 482 additional gene products are conserved in other organisms but have no apparent S. cerevisiae orthologue. There were 4,493 orthology assignments shared between PombeDB and SYNERGY, comprising 72% of PombeDB s assignments and 8% of the SYNERGY s total 46

47 doi:.38/nature67 SUPPLEMENTARY INFORMATION assignments (between S. cerevisiae and S. pombe). We note that the two species S. cerevisiae and S. pombe span the entire phylogenic range of our study, and virtually no shared gene order is present between these species, making this orthology assignment task among the most difficult we have tackled. We also note that many of the 3,72 S. pombe genes are assigned one-to-many or many-to-many orthologues in PombeDB, and that SYNERGY has likely separated some of those into distinct orthogroups. Simulated Orthogroups To obtain an objective measure of SYNERGY s accuracy, we simulated orthogroup evolution including multiple rounds of speciation events and with pre-specified rates of gene duplication and loss. At each stage, we used the SEQ-GEN program 66 to simulate the evolution of protein sequences using the JTT model of amino acid substitution 52. In order to make these simulations as true to fungal protein sequences as possible, we initiated the simulations with varying numbers of randomly drawn sequences from S. cerevisiae. For the purposes of this benchmark, we ignored the chromosomal ordering of the simulated sequences, since to the best of our knowledge there is no general agreedupon model for chromosomal evolution. As a result, this test evaluates SYNERGY performance when no synteny information is considered. We parameterized our simulations as follows: Using a balanced species tree topology containing 6 species, we gave each orthogroup a probability of. of incurring a duplication or loss along every branch of the species tree. These duplication and loss rates are relative high, but we were interested in examining how well SYNERGY performs under such volatile conditions. The rates of amino acid substitutions between orthologues was specified by the branch lengths in the simulated gene trees. These lengths were drawn from an exponential distributed with a mean of.36 (approximately the mean branch length in the fungal species tree we used). SYNERGY accurately detected over 85% of the orthologous relations in our simulated data sets of various sizes (Figure 2). Further, its specificity was remarkably high - nearly 47

48 doi:.38/nature67 SUPPLEMENTARY INFORMATION 99% - despite the presence of many of paralogues in the genome from which the simulated sequences were originally drawn. We believe that the sensitivity could have been further improved had we included chromosomal order into these simulations, allowing SYNERGY to predict paralogues more robustly. Importantly, we found no significant trend suggesting that the number of sampled sequences affected SYNERGY s overall performance in these simulations. While we recognize that the implications of such benchmarks should be carefully interpreted, we believe that these simulations accurately reflect SYNERGY s strong performance on data that is based on a reasonable model of fungal sequence evolution. Figure 2: Benchmarks on Simulated Orthogroups. The percent of orthologues identified (Sensitivity) and the percent of correct orthology assignments (Specificity) across simulation trials with varying numbers of orthogroups. 48

49 doi:.38/nature67 SUPPLEMENTARY INFORMATION Supplemental Note 4: The loss and appearance of genes essential for S. cerevisiae growth Genes and modules essential for S. cerevisiae growth coincide specifically with uniform orthogroups (45/47 essential genes are in uniform orthogroups, P<3X -7 ). This suggests that it is not only difficult to lose an essential gene (as expected) but also to gain a new essential gene or to duplicate it. Of the essential genes in S. cerevisiae, 36 are not found in ancestral orthogroups, suggesting that in some instances new essential functions can be gained. In addition, 62 essential genes are ancestral but are absent from at least two genomes. The associated essential function of those genes was either lost as well, or was likely adopted by another gene. Here we discuss notable examples of both classes. Examples of essential genes that are not ancestral There are 36 essential S. cerevisiae genes that are not ancestral. They cover a wide variety of functions, which often involve many other essential genes that are ancestral. With few exceptions, most do not have distant homologues within our orthogroup catalog, other fungi or metazoa, so it is difficult to postulate the origin of these genes. Below we consider selected examples, all of which appear to be such true innovations (unless stated otherwise). All examples were selected from orthogroups that appeared in the clade spanning K. waltii S. cerevisiae, unless noted otherwise. The annotations of all genes are quoted from SGD. SNU56 (YDR24C), Orthogroup #29687 Component of U snrnp required for mrna splicing via spliceosome. The lack of any similar sequence prohibits us from any putative assignment of the origin of this sequence. CDC3 (YDL22C), Orthogroup #298 Single stranded DNA-binding protein found at TG-3 telomere G-tails; regulates telomere replication through recruitment of specific sub-complexes, but the essential function is telomere capping. The innovation at this particular phylogenetic point is consistent with a change the telomeric repeat sequence at the same point in the phylogeny 67, and with the loss of the ancestral protein Pot 49

50 doi:.38/nature67 SUPPLEMENTARY INFORMATION (Orthogroup #686) which fulfills the same function in animals, S. pombe and the Euascomycota. KRE29 (YER38C), Orthogroup #2234 Essential subunit of the Mms2-Smc5-Smc6 complex; protein of unknown function; required for growth and DNA repair. There are 5 subunits in this complex: Mms2, Smc5 and Smc6 are all from Persistent orthogroups, but the fifth subunit, SNE5 (YML23C, Orthogroup #225), also appears at the same point in the phylogeny as KRE29. RTT5 (YER4W), Orthogroup # Protein with a role in regulation of Ty transposition. Since most mechanisms controlling repetitive elements in S. pombe and the Euascomycota were lost in the Hemiascomycota, this alternative mechanism may have become essential. RFA3 (YJL73C), Orthogroup #2997 Subunit of heterotrimeric Replication Factor A (RF-A), which is a highly conserved single-stranded DNA binding protein involved in DNA replication, repair, and recombination. This novelty is very surprising as this is a subunit of the SSB (replication factor A, single strand DNA binding protein) of S. cerevisiae. We cannot rule out that since this is a short protein (22 amino acids), the gene may have not been predicted in the other genomes (and is therefore not within the input set of open readings frames). MPS2 (YGL75C), Orthogroup #222 - Essential membrane protein localized at the nuclear envelope and spindle pole body (SPB), required for insertion of the newly duplicated SPB into the nuclear envelope. Spindle related functions are generally enriched for appearing genes (see main text). GCR (YPL75W), Orthogroup # Transcriptional activator of genes involved in glycolysis; DNA-binding protein that interacts and functions with the transcriptional activator Gcr2p. Null mutant has a severe growth defect when grown in the presence of glucose, but grows quite well on medium with non-fermentable carbon sources; on 5

51 doi:.38/nature67 SUPPLEMENTARY INFORMATION permissive medium, the null mutant principally affects the expression of glycolytic enzyme genes and transcripts encoded by Ty elements 68. Mutant exhibits reduction in the intracellular concentration of enolase and glyceraldehyde-3-phosphate dehydrogenase polypeptides. Both GCR and GCR2 (which is not essential) appear at the same phylogenetic point, and neither has clear similarity to other orthogroups (or to each other). However, both show very distant similarity to several other transcription factors (e.g. Hot, Msn), so are likely the result of duplication and very fast divergence of one of them. Nevertheless, this suggests a substantial innovation in glycolytic protein gene expression. MED (YMR2C), Orthogroup # Appeared in the clade spanning Y. lipolytica S. cerevisiae - Essential subunit of the RNA polymerase II mediator complex; associates with core polymerase subunits to form the RNA polymerase II holoenzyme. We cannot rule out that since this is a relatively short protein, the gene may have not been predicted in the other genomes (and is therefore not within the input set of open readings frames). Examples of ancestral essential genes that were lost in more than two species: There are 62 ancestral genes that were completely lost in two or more species and are essential for S. cerevisiae growth in rich YPD medium. Below we consider a few examples, all of which involve orthogroups that were completely lost in the Euascomycota. In many cases, the lost gene has an ancient paralog (duplicated prior to the last common ancestor of S. pombe and S. cerevisiae) which may have been able to assume its role ( de-specialization ). Nevertheless, we do observe certain examples of ancestral essential genes which were lost without having an apparent backup. IFH (YLR223C) and CRF (YDR223W), Orthogroup #982, FHL (YPR4C), Orthogroup # 33 These transcription factors are essential for ribosomal protein gene expression in S. cerevisiae. FHL binds the IFHL site in ribosomal protein gene promoters, and activates and represses their expression based on the nutrient status of the 5

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Evolution by duplication

Evolution by duplication 6.095/6.895 - Computational Biology: Genomes, Networks, Evolution Lecture 18 Nov 10, 2005 Evolution by duplication Somewhere, something went wrong Challenges in Computational Biology 4 Genome Assembly

More information

Supplementary Information for

Supplementary Information for Supplementary Information for Evolutionary conservation of codon optimality reveals hidden signatures of co-translational folding Sebastian Pechmann & Judith Frydman Department of Biology and BioX, Stanford

More information

Comparative Genomics II

Comparative Genomics II Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods

More information

BMD645. Integration of Omics

BMD645. Integration of Omics BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

The Role of Nucleosome Positioning in the Evolution of Gene Regulation

The Role of Nucleosome Positioning in the Evolution of Gene Regulation The Role of Nucleosome Positioning in the Evolution of Gene Regulation Alexander M. Tsankov 1,2, Dawn Anne Thompson 1, Amanda Socha 1, Aviv Regev 1,3,4 *., Oliver J. Rando 5 *. 1 Broad Institute of MIT

More information

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona (tgabaldon@crg.es) http://gabaldonlab.crg.es Homology the same organ in different animals under

More information

In Search of the Biological Significance of Modular Structures in Protein Networks

In Search of the Biological Significance of Modular Structures in Protein Networks In Search of the Biological Significance of Modular Structures in Protein Networks Zhi Wang, Jianzhi Zhang * Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan,

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Toni Gabaldón Contact: tgabaldon@crg.es Group website: http://gabaldonlab.crg.es Science blog: http://treevolution.blogspot.com

More information

Supplementary Information

Supplementary Information Supplementary Information Supplementary Figure 1. Schematic pipeline for single-cell genome assembly, cleaning and annotation. a. The assembly process was optimized to account for multiple cells putatively

More information

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

Name: SBI 4U. Gene Expression Quiz. Overall Expectation: Gene Expression Quiz Overall Expectation: - Demonstrate an understanding of concepts related to molecular genetics, and how genetic modification is applied in industry and agriculture Specific Expectation(s):

More information

Comparative Gene Expression Analysis by a Differential Clustering Approach: Application to the Candida albicans Transcription Program

Comparative Gene Expression Analysis by a Differential Clustering Approach: Application to the Candida albicans Transcription Program Comparative Gene Expression Analysis by a Differential Clustering Approach: Application to the Candida albicans Transcription Program Jan Ihmels 1[, Sven Bergmann 1,2[, Judith Berman 3, Naama Barkai 1*

More information

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Molecular Biology-2018 1 Definitions: RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Heterologues: Genes or proteins that possess different sequences and activities. Homologues: Genes or proteins that

More information

Supplementary Figure 3

Supplementary Figure 3 Supplementary Figure 3 a 1 (i) (ii) (iii) (iv) (v) log P gene Q group, % ~ ε nominal 2 1 1 8 6 5 A B C D D' G J L M P R U + + ε~ A C B D D G JL M P R U -1 1 ε~ (vi) Z group 2 1 1 (vii) (viii) Z module

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

2 Genome evolution: gene fusion versus gene fission

2 Genome evolution: gene fusion versus gene fission 2 Genome evolution: gene fusion versus gene fission Berend Snel, Peer Bork and Martijn A. Huynen Trends in Genetics 16 (2000) 9-11 13 Chapter 2 Introduction With the advent of complete genome sequencing,

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Discovering modules in expression profiles using a network

Discovering modules in expression profiles using a network Discovering modules in expression profiles using a network Igor Ulitsky 1 2 Protein-protein interactions (PPIs) Low throughput measurements: accurate, scarce High throughput: more abundant, noisy Large,

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

V19 Metabolic Networks - Overview

V19 Metabolic Networks - Overview V19 Metabolic Networks - Overview There exist different levels of computational methods for describing metabolic networks: - stoichiometry/kinetics of classical biochemical pathways (glycolysis, TCA cycle,...

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information

PHYLOGENY AND SYSTEMATICS

PHYLOGENY AND SYSTEMATICS AP BIOLOGY EVOLUTION/HEREDITY UNIT Unit 1 Part 11 Chapter 26 Activity #15 NAME DATE PERIOD PHYLOGENY AND SYSTEMATICS PHYLOGENY Evolutionary history of species or group of related species SYSTEMATICS Study

More information

16 The Cell Cycle. Chapter Outline The Eukaryotic Cell Cycle Regulators of Cell Cycle Progression The Events of M Phase Meiosis and Fertilization

16 The Cell Cycle. Chapter Outline The Eukaryotic Cell Cycle Regulators of Cell Cycle Progression The Events of M Phase Meiosis and Fertilization The Cell Cycle 16 The Cell Cycle Chapter Outline The Eukaryotic Cell Cycle Regulators of Cell Cycle Progression The Events of M Phase Meiosis and Fertilization Introduction Self-reproduction is perhaps

More information

Chapter 17. From Gene to Protein. Biology Kevin Dees

Chapter 17. From Gene to Protein. Biology Kevin Dees Chapter 17 From Gene to Protein DNA The information molecule Sequences of bases is a code DNA organized in to chromosomes Chromosomes are organized into genes What do the genes actually say??? Reflecting

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

V14 extreme pathways

V14 extreme pathways V14 extreme pathways A torch is directed at an open door and shines into a dark room... What area is lighted? Instead of marking all lighted points individually, it would be sufficient to characterize

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON PROKARYOTE GENES: E. COLI LAC OPERON CHAPTER 13 CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON Figure 1. Electron micrograph of growing E. coli. Some show the constriction at the location where daughter

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

A Functional Selection Model Explains Evolutionary Robustness Despite Plasticity in Regulatory Networks

A Functional Selection Model Explains Evolutionary Robustness Despite Plasticity in Regulatory Networks Functional Selection Model Explains Evolutionary Robustness espite Plasticity in Regulatory Networks The Harvard community has made this article openly available. Please share how this access benefits

More information

Supplementary Information 16

Supplementary Information 16 Supplementary Information 16 Cellular Component % of Genes 50 45 40 35 30 25 20 15 10 5 0 human mouse extracellular other membranes plasma membrane cytosol cytoskeleton mitochondrion ER/Golgi translational

More information

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Evidence for dynamically organized modularity in the yeast protein-protein interaction network Evidence for dynamically organized modularity in the yeast protein-protein interaction network Sari Bombino Helsinki 27.3.2007 UNIVERSITY OF HELSINKI Department of Computer Science Seminar on Computational

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3 The Minimal-Gene-Set -Kapil Rajaraman(rajaramn@uiuc.edu) PHY498BIO, HW 3 The number of genes in organisms varies from around 480 (for parasitic bacterium Mycoplasma genitalium) to the order of 100,000

More information

Eukaryotic Gene Expression

Eukaryotic Gene Expression Eukaryotic Gene Expression Lectures 22-23 Several Features Distinguish Eukaryotic Processes From Mechanisms in Bacteria 123 Eukaryotic Gene Expression Several Features Distinguish Eukaryotic Processes

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Zhongyi Xiao. Correlation. In probability theory and statistics, correlation indicates the

Zhongyi Xiao. Correlation. In probability theory and statistics, correlation indicates the Character Correlation Zhongyi Xiao Correlation In probability theory and statistics, correlation indicates the strength and direction of a linear relationship between two random variables. In general statistical

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Name Period Words to Know: nucleotides, DNA, complementary base pairing, replication, genes, proteins, mrna, rrna, trna, transcription, translation, codon,

More information

Additions, Losses, and Rearrangements on the Evolutionary Route from a Reconstructed Ancestor to the Modern Saccharomyces cerevisiae Genome

Additions, Losses, and Rearrangements on the Evolutionary Route from a Reconstructed Ancestor to the Modern Saccharomyces cerevisiae Genome Additions, Losses, and Rearrangements on the Evolutionary Route from a Reconstructed Ancestor to the Modern Saccharomyces cerevisiae Genome Jonathan L. Gordon 1,2, Kevin P. Byrne 1, Kenneth H. Wolfe 1

More information

GEP Annotation Report

GEP Annotation Report GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11 The Eukaryotic Genome and Its Expression Lecture Series 11 The Eukaryotic Genome and Its Expression A. The Eukaryotic Genome B. Repetitive Sequences (rem: teleomeres) C. The Structures of Protein-Coding

More information

7.06 Problem Set #4, Spring 2005

7.06 Problem Set #4, Spring 2005 7.06 Problem Set #4, Spring 2005 1. You re doing a mutant hunt in S. cerevisiae (budding yeast), looking for temperaturesensitive mutants that are defective in the cell cycle. You discover a mutant strain

More information

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes Molecular and Cellular Biology Animal Cell ((eukaryotic cell) -----> compare with prokaryotic cell) ENDOPLASMIC RETICULUM (ER) Rough ER Smooth ER Flagellum Nuclear envelope Nucleolus NUCLEUS Chromatin

More information

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information -

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information - Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes - Supplementary Information - Martin Bartl a, Martin Kötzing a,b, Stefan Schuster c, Pu Li a, Christoph Kaleta b a

More information

Number of questions TEK (Learning Target) Biomolecules & Enzymes

Number of questions TEK (Learning Target) Biomolecules & Enzymes Unit Biomolecules & Enzymes Number of questions TEK (Learning Target) on Exam 8 questions 9A I can compare and contrast the structure and function of biomolecules. 9C I know the role of enzymes and how

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species.

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species. Supplementary Figure 1 Icm/Dot secretion system region I in 41 Legionella species. Homologs of the effector-coding gene lega15 (orange) were found within Icm/Dot region I in 13 Legionella species. In four

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Eukaryotic vs. Prokaryotic genes

Eukaryotic vs. Prokaryotic genes BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 18: Eukaryotic genes http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Eukaryotic vs. Prokaryotic genes Like in prokaryotes,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:1.138/nature1213 Supplementary Table 1. The Taxonomy of the Organisms Used in this Study Organism (acronym) Taxonomy Yeasts Zygosacharomyces rouxii (Zrou) Verterbrates Xenopus tropicalis (Xtro) Gallus

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

1. In most cases, genes code for and it is that

1. In most cases, genes code for and it is that Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod

More information

2. Cellular and Molecular Biology

2. Cellular and Molecular Biology 2. Cellular and Molecular Biology 2.1 Cell Structure 2.2 Transport Across Cell Membranes 2.3 Cellular Metabolism 2.4 DNA Replication 2.5 Cell Division 2.6 Biosynthesis 2.1 Cell Structure What is a cell?

More information

Introduction. Gene expression is the combined process of :

Introduction. Gene expression is the combined process of : 1 To know and explain: Regulation of Bacterial Gene Expression Constitutive ( house keeping) vs. Controllable genes OPERON structure and its role in gene regulation Regulation of Eukaryotic Gene Expression

More information

Mitosis vs Meiosis. Mitosis and Meiosis -- Internet Tutorial

Mitosis vs Meiosis. Mitosis and Meiosis -- Internet Tutorial Mitosis and Meiosis -- Internet Tutorial In this internet lesson, you will review the steps of mitosis and meiosis and view video simulations of cell division. Mitosis: An Interactive Animation (http://www.cellsalive.com/mitosis.htm)

More information

9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes Molecular and Cellular Biology Animal Cell ((eukaryotic cell) -----> compare with prokaryotic cell) ENDOPLASMIC RETICULUM (ER) Rough ER Smooth ER Flagellum Nuclear envelope Nucleolus NUCLEUS Chromatin

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on: 17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.

More information

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype Lecture Series 7 From DNA to Protein: Genotype to Phenotype Reading Assignments Read Chapter 7 From DNA to Protein A. Genes and the Synthesis of Polypeptides Genes are made up of DNA and are expressed

More information

Lecture 2: Read about the yeast MAT locus in Molecular Biology of the Gene. Watson et al. Chapter 10. Plus section on yeast as a model system Read

Lecture 2: Read about the yeast MAT locus in Molecular Biology of the Gene. Watson et al. Chapter 10. Plus section on yeast as a model system Read Lecture 2: Read about the yeast MAT locus in Molecular Biology of the Gene. Watson et al. Chapter 10. Plus section on yeast as a model system Read chapter 22 and chapter 10 [section on MATing type gene

More information

Oceans: the cradle of life? Chapter 5. Cells: a sense of scale. Head of a needle

Oceans: the cradle of life? Chapter 5. Cells: a sense of scale. Head of a needle Oceans: the cradle of life? Highest diversity of life, particularly archae, bacteria, and animals Will start discussion of life in the ocean with prokaryote microorganisms Prokaryotes are also believed

More information

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and counterclockwise for the inner row, with green representing coding

More information

Biology Science Crosswalk

Biology Science Crosswalk SB1. Students will analyze the nature of the relationships between structures and functions in living cells. a. Explain the role of cell organelles for both prokaryotic and eukaryotic cells, including

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Rule learning for gene expression data

Rule learning for gene expression data Rule learning for gene expression data Stefan Enroth Original slides by Torgeir R. Hvidsten The Linnaeus Centre for Bioinformatics Predicting biological process from gene expression time profiles Papers:

More information

SUPPLEMENTARY METHODS

SUPPLEMENTARY METHODS SUPPLEMENTARY METHODS M1: ALGORITHM TO RECONSTRUCT TRANSCRIPTIONAL NETWORKS M-2 Figure 1: Procedure to reconstruct transcriptional regulatory networks M-2 M2: PROCEDURE TO IDENTIFY ORTHOLOGOUS PROTEINSM-3

More information

Ensemble Non-negative Matrix Factorization Methods for Clustering Protein-Protein Interactions

Ensemble Non-negative Matrix Factorization Methods for Clustering Protein-Protein Interactions Belfield Campus Map Ensemble Non-negative Matrix Factorization Methods for Clustering Protein-Protein Interactions

More information

Biology I Fall Semester Exam Review 2014

Biology I Fall Semester Exam Review 2014 Biology I Fall Semester Exam Review 2014 Biomolecules and Enzymes (Chapter 2) 8 questions Macromolecules, Biomolecules, Organic Compunds Elements *From the Periodic Table of Elements Subunits Monomers,

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

Name: Date: Hour: Unit Four: Cell Cycle, Mitosis and Meiosis. Monomer Polymer Example Drawing Function in a cell DNA

Name: Date: Hour: Unit Four: Cell Cycle, Mitosis and Meiosis. Monomer Polymer Example Drawing Function in a cell DNA Unit Four: Cell Cycle, Mitosis and Meiosis I. Concept Review A. Why is carbon often called the building block of life? B. List the four major macromolecules. C. Complete the chart below. Monomer Polymer

More information

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid. 1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Comparative Network Analysis

Comparative Network Analysis Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by

More information

Processes of Evolution

Processes of Evolution 15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

Protein-protein interaction networks Prof. Peter Csermely

Protein-protein interaction networks Prof. Peter Csermely Protein-Protein Interaction Networks 1 Department of Medical Chemistry Semmelweis University, Budapest, Hungary www.linkgroup.hu csermely@eok.sote.hu Advantages of multi-disciplinarity Networks have general

More information

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms Bioinformatics Advance Access published March 7, 2007 The Author (2007). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

More information

Whole-genome analysis of GCN4 binding in S.cerevisiae

Whole-genome analysis of GCN4 binding in S.cerevisiae Whole-genome analysis of GCN4 binding in S.cerevisiae Lillian Dai Alex Mallet Gcn4/DNA diagram (CREB symmetric site and AP-1 asymmetric site: Song Tan, 1999) removed for copyright reasons. What is GCN4?

More information

From gene to protein. Premedical biology

From gene to protein. Premedical biology From gene to protein Premedical biology Central dogma of Biology, Molecular Biology, Genetics transcription replication reverse transcription translation DNA RNA Protein RNA chemically similar to DNA,

More information

GACE Biology Assessment Test I (026) Curriculum Crosswalk

GACE Biology Assessment Test I (026) Curriculum Crosswalk Subarea I. Cell Biology: Cell Structure and Function (50%) Objective 1: Understands the basic biochemistry and metabolism of living organisms A. Understands the chemical structures and properties of biologically

More information

Presentation by Julie Hudson MAT5313

Presentation by Julie Hudson MAT5313 Proc. Natl. Acad. Sci. USA Vol. 89, pp. 6575-6579, July 1992 Evolution Gene order comparisons for phylogenetic inference: Evolution of the mitochondrial genome (genomics/algorithm/inversions/edit distance/conserved

More information

Comparative genome analysis across a kingdom of eukaryotic organisms: Specialization and diversification in the Fungi

Comparative genome analysis across a kingdom of eukaryotic organisms: Specialization and diversification in the Fungi Resource Comparative genome analysis across a kingdom of eukaryotic organisms: Specialization and diversification in the Fungi Michael J. Cornell, 1,2 Intikhab Alam, 1 Darren M. Soanes, 3 Han Min Wong,

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Cellular Neuroanatomy I The Prototypical Neuron: Soma. Reading: BCP Chapter 2

Cellular Neuroanatomy I The Prototypical Neuron: Soma. Reading: BCP Chapter 2 Cellular Neuroanatomy I The Prototypical Neuron: Soma Reading: BCP Chapter 2 Functional Unit of the Nervous System The functional unit of the nervous system is the neuron. Neurons are cells specialized

More information

S1 Gene ontology (GO) analysis of the network alignment results

S1 Gene ontology (GO) analysis of the network alignment results 1 Supplementary Material for Effective comparative analysis of protein-protein interaction networks by measuring the steady-state network flow using a Markov model Hyundoo Jeong 1, Xiaoning Qian 1 and

More information

Chapter Chemical Uniqueness 1/23/2009. The Uses of Principles. Zoology: the Study of Animal Life. Fig. 1.1

Chapter Chemical Uniqueness 1/23/2009. The Uses of Principles. Zoology: the Study of Animal Life. Fig. 1.1 Fig. 1.1 Chapter 1 Life: Biological Principles and the Science of Zoology BIO 2402 General Zoology Copyright The McGraw Hill Companies, Inc. Permission required for reproduction or display. The Uses of

More information

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species 02-710 Computational Genomics Reconstructing dynamic regulatory networks in multiple species Methods for reconstructing networks in cells CRH1 SLT2 SLR3 YPS3 YPS1 Amit et al Science 2009 Pe er et al Recomb

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information