Supplemental Material

Similar documents
Supplementary Information 16

S1 Gene ontology (GO) analysis of the network alignment results

Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability

Multiple Choice Review- Eukaryotic Gene Expression

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

Chapter 7.2. Cell Structure

9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

BMD645. Integration of Omics

Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis

BME 5742 Biosystems Modeling and Control

Introduction to Bioinformatics

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

Class IX: Biology Chapter 5: The fundamental unit of life. Chapter Notes. 1) In 1665, Robert Hooke first discovered and named the cells.

2011 The Simple Homeschool Simple Days Unit Studies Cells

Introduction 1) List the 3 types of cells you will be comparing in today s lesson. a. b. c.

Philipsburg-Osceola Area School District Science Department. Standard(s )

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

Compare and contrast the cellular structures and degrees of complexity of prokaryotic and eukaryotic organisms.

GO ID GO term Number of members GO: translation 225 GO: nucleosome 50 GO: calcium ion binding 76 GO: structural

2. Cellular and Molecular Biology

Предсказание и анализ промотерных последовательностей. Татьяна Татаринова

Cellular Neuroanatomy I The Prototypical Neuron: Soma. Reading: BCP Chapter 2

Chapter 4. Table of Contents. Section 1 The History of Cell Biology. Section 2 Introduction to Cells. Section 3 Cell Organelles and Features

Organelles in Eukaryotic Cells

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16

Biology 112 Practice Midterm Questions

Lecture 13: PROTEIN SYNTHESIS II- TRANSLATION

EUBACTERIA CYTOLOGY CHLOROPLAST: ABSENT RIBOSOME CAPSULE CELL WALL PROTOPLAST CELL MEMBRANE NUCLEOID MESOSOME CYTOSOL FLAGELLA

Cell Alive Homeostasis Plants Animals Fungi Bacteria. Loose DNA DNA Nucleus Membrane-Bound Organelles Humans

Biological Process Term Enrichment

REVIEW 2: CELLS & CELL COMMUNICATION. A. Top 10 If you learned anything from this unit, you should have learned:

Cell Organelles Tutorial

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

CHAPTER 3. Cell Structure and Genetic Control. Chapter 3 Outline

CELL : THE UNIT OF LIFE

Honors Biology Fall Final Exam Study Guide

Principles of Cellular Biology

GCD3033:Cell Biology. Transcription

Chapter 1. DNA is made from the building blocks adenine, guanine, cytosine, and. Answer: d

Regulation and signaling. Overview. Control of gene expression. Cells need to regulate the amounts of different proteins they express, depending on

Introduction. Gene expression is the combined process of :

CELL PART Expanded Definition Cell Structure Illustration Function Summary Location ALL CELLS DNA Common in Animals Uncommon in Plants Lysosome

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem

Chapter 15 Active Reading Guide Regulation of Gene Expression

Cell (Learning Objectives)

Cell Organelles. a review of structure and function

Biology Semester 1 Study Guide

MOLECULAR BIOLOGY BIOL 021 SEMESTER 2 (2015) COURSE OUTLINE

Genome Annotation Project Presentation

Chapter 6: A Tour of the Cell

Chapter 4: Cells: The Working Units of Life

Cells & Cell Organelles. Doing Life s Work

2017 DECEMBER BIOLOGY SEMESTER EXAM DISTRICT REVIEW

Biology I Fall Semester Exam Review 2014

Introduction to Cells- Stations Lab

Name Hour. Section 7-1 Life Is Cellular (pages )

Components of a functional cell. Boundary-membrane Cytoplasm: Cytosol (soluble components) & particulates DNA-information Ribosomes-protein synthesis

Number of questions TEK (Learning Target) Biomolecules & Enzymes

Topic 4 - #14 The Lactose Operon

The Cell Notes 1 of 11

Basic Biology. Content Skills Learning Targets Assessment Resources & Technology

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11

Cell Is the basic structural, functional, and biological unit of all known living organisms. Cells are the smallest unit of life and are often called

Organelles & Cells Student Edition. A. chromosome B. gene C. mitochondrion D. vacuole

Translation Part 2 of Protein Synthesis

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

NCERT solution for Cell - Structure and Functions Science

RNA Synthesis and Processing

Reception The target cell s detection of a signal coming from outside the cell May Occur by: Direct connect Through signal molecules

Name Period The Control of Gene Expression in Prokaryotes Notes

From Gene to Protein

Cell Structure and Function

Class XI Chapter 8 Cell The Unit of Life Biology

Regulation of gene expression. Premedical - Biology

REVIEW 2: CELLS & CELL DIVISION UNIT. A. Top 10 If you learned anything from this unit, you should have learned:

Miller & Levine Biology 2010

Clicker Question. Clicker Question

1. In most cases, genes code for and it is that

What Kind Of Molecules Carry Protein Assembly Instructions From The Nucleus To The Cytoplasm

Organelles in Eukaryotic Cells

Rule learning for gene expression data

It took more than years for scientists to develop that would allow them to really study.

Life is Cellular. Discovery of the Cell. Chapter 7 Cell Structure & Function. Exploring the Cell. Introduction. The Discovery of the Cell

Chapter 17. From Gene to Protein. Biology Kevin Dees

What in the Cell is Going On?

Biochemistry: A Review and Introduction

Biology. 7-2 Eukaryotic Cell Structure 10/29/2013. Eukaryotic Cell Structures

Chapter 6: A Tour of the Cell

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

SUB-CELLULAR LOCALIZATION PREDICTION USING MACHINE LEARNING APPROACH

prokaryotic eukaryotic

Types of biological networks. I. Intra-cellurar networks

Keystone Exams: Biology Assessment Anchors and Eligible Content. Pennsylvania Department of Education

Concept 6.1 To study cells, biologists use microscopes and the tools of biochemistry

Name Date Period Unit 1 Basic Biological Principles 1. What are the 7 characteristics of life?

Warm-Up. Answer the following questions in a complete sentence and explain why each answer is correct.

Base your answers to questions 1 and 2 on the diagram below which represents a typical green plant cell and on your knowledge of biology.

Function and Illustration. Nucleus. Nucleolus. Cell membrane. Cell wall. Capsule. Mitochondrion

Signal Transduction. Dr. Chaidir, Apt

Topic 1: The Chemical Context of Life, Holtzclaw and Holtzclaw, 2014

Transcription:

Supplemental Material Protocol S1 The PromPredict algorithm considers the average free energy over a 100nt window and compares it with the the average free energy over a downstream 100nt window separated by 50bp. Hence, to predict whether a nucleotide n+50 is a promoter, the following parameters are determined and E1(n+50) and D(n+50) are compared with their corresponding cutoffs from Table S1. E1(n+50)= n+100 n G 0 100 (1) E2(n+50)= n+250 n+150 G0 100 (2) D(n+50)= E1(n+50) E2(n+50) (3) All such predicted consecutive nucleotides are grouped to form the same prediction if they lie within 50bp of each other. The procedure of arriving at cut-off values and prediction method is described in detail by Rangannan and Bansal (2009). Table S1: Cutoff values for E1 and D: The cutoff values are applied according to the GC content of surrounding 1000nt fragment. The cutoffs have been derived by training on transcription start site (TSS) and translation start site (TLS) data of prokaryotes (Rangannan and Bansal, 2010) and match well with the AFE values obtained for upstream promoter and downstream non-promoter regions in Arabidopsis and rice, as shown in Fig. S3. GC% Range E1-cutoff D-cutoff GC% Range E1-cutoff D-cutoff 15 20-14.5 1.0 50 55-19.2 1.4 20 25-15.1 1.0 55 60-20.0 1.4 25 30-15.6 1.0 60 65-21.0 1.2 30 35-16.3 1.0 65 70-22.0 1.2 35 40-16.8 1.2 70 75-23.0 1.2 40 45-17.3 1.4 75 80-24.0 1.2 45 50-18.3 1.4 80 85-25.0 1.2 1

Frequency 2500 2000 1500 1000 A 500 to +500bp 500 to 100bp 80 to +20bp +100 to +500bp 500 1000 0 10 20 30 40 50 60 70 80 90 GC% range 800 B Frequency 600 400 200 0 10 20 30 40 50 60 70 80 90 GC% range Figure S1: GC content distribution for (A) Arabidopsis and (B) rice sequences in the vicinity of transcription start site (TSS): The frequency distribution of GC content in a 1000bp segment ( 500bp to +500bp) around TSS, upstream sequence ( 500bp to 100bp), downstream sequence (+100bp to+500bp) and sequence adjacent to the TSS ( 80bp to+20bp) are shown. The GC content of Arabidopsis promoters shows a narrow range within each subclass and small differences between the subclasses. The GC content for the rice promoters shows a broad range with large differences between subclasses especially upstream and downstream regions. 2

Average Free Energy (kcal/mol) Average Free Energy (kcal/mol) 15 16 17 18 19 20 21 500 0 500 Nucleotide Position 17 18 19 20 21 22 23 A B 24 500 0 500 Nucleotide Position 25% 30% 30% 35% 35% 40% 40% 45% 45% 50% 35% 40% 40% 45% 45% 50% 50% 55% 55% 60% 60% 65% 65% 70% Figure S2: AFE profiles for 1000bp sequences ( 500 to+500bp with respect to TSS) in different ranges of GC content are shown for genes from chr.1 of (A) Arabidopsis and (B) rice. 3

Figure S3: (A) AFE profiles in the region 2000bp to+2000bp in the vicinity of the TSS for chr1 Arabidopsis and rice. (B) The threshold values of free energy used to predict promoters in genomic DNA with GC content varying between 20 to 80% as derived from prokaryotic genomic data are shown in black. The filled circles correspond to the minimal free energy threshold (E1) for a particular fragment and hollow circles to the free energy (E2) in its downstream region, for it to be assigned as a putative promoter. The average free energy values in a representative set of 1001 nt long sequences flanking the TSSs in Arabidopsis and rice are shown as red and green squares respectively. The AFE values for the +500 to +1000 downstream regions in Arabidopsis and rice are shown as hollow squares connected by a dashed lines. 4

12 Rice chr1 pred (74762) Arabidopsis chr1 pred (49706) 10 Dmax value 8 6 Mean+2S.D Mean+S.D 4 Mean 2 Mean S.D. 0 20 30 40 50 GC % of 1000nt fragment 60 70 80 Figure S4: Cutoffs values for score classes were determined from the mean and standard deviation of Dmax values from Arabidopsis and rice predictions. Table S2: Cutoff values based on mean and standard deviation of maximum D value of Arabidopsis and rice predictions GC% Range No. of seqa 15-20 20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80 113 1789 9405 26073 37942 20018 11709 7195 4725 2903 1836 726 24 Mean S.D. 1.14 1.20 1.28 1.30 1.41 1.60 1.66 1.94 2.16 1.88 1.50 1.38 1.30 Dmax cutoff Mean Mean+S.D. 2.00 2.08 2.23 2.34 2.46 2.87 3.26 3.66 3.90 3.66 3.06 2.52 2.71 Mean + 2S.D. 2.86 2.95 3.19 3.37 3.51 4.15 4.85 5.39 5.64 5.45 4.62 3.66 4.12 a: Sequences pooled from both rice and Arabidopsis predictions 5 3.73 3.83 4.14 4.41 4.56 5.42 6.45 7.11 7.37 7.23 6.19 4.80 5.54

3 2.5 A AAAA %Frequency of tetramer in -50 to -20 bp of TSS 2 1.5 1 0.5 1.5 0.5 TATA TAAA ATAA ATAT AAAT AATA TTTATTT TTAA AAAC TTAT TCTT AATT ATTATAAT AACA AAGA AGAA AATC TTTT 0 0 0.5 1 1.5 2 2.5 3 1 B AACC TATA CCCC TCTC ATAT ATAA TCCC CTCT TAAA CCAC TCCT CACC CCCA AAAT CAAA CTTC AAAC TCTT AAAG TCCA ACCA CGCG ATTT GCGC CGGC GAGG GCGG AAAA CTCC CCTC TTTT CGCC CCGC GCCG 0 0 0.5 1 1.5 %Frequency of tetramer in -500 to +500 bp of TSS Figure S5: Percentage frequency of occurrence of tetramers in core promoter ( 50 to 20 bp) as compared to that in the 1000nt ( 500 to+500 bp) region in the vicinity of TSS for (A) Arabidopsis and (B) rice. 6

Protocol S2 Gene Ontology: The genes categorized in the Gene Ontology GO SLIM categories were downloaded from TAIR (http://www.arabidopsis.org) for Arabidopsis and AmiGO (The Gene Ontology Consortium, 2000; Carbon et al., 2009) for rice. The percentage of genes that have TP predictions in region 500 to+100bp with respect to the TSS (T P genes ) and that don t have any predictions in this region (FP genes ) were determined. 7

A Chr 1 (TPgenes) Chr 1 regula on_of_transcrip on proteolysis oxida on_reduc on metabolic_process DNA_integra on chroma n_assembly_or_disassembly carbohydrate_metabolic_process apoptosis GO_class(gene-no:1-4) GO_class(gene-no:5-9) 238 63 48 339 164 154 245 89 517 405 0 2 4 6 8 10 12 14 16 % of genes from whole genome B RNA_binding zinc_ion_binding protein_binding nucleic_acid_binding kinase_ac vity DNA_binding cataly c_ac vity ATP_binding aspar c-type_endopep dase_ac vity GO_class(gene-no:1-4) GO_class(gene-no:5-9) 250 481 567 587 204 242 145 270 471 358 628 0 5 10 15 20 % of genes from whole genome C extracellular_region ac n_cytoskeleton plas d nucleus membrane intracellular integral_to_membrane cytoplasm GO_class(gene-no:1-4) GO_class(gene-no:5-9) 35 44 56 389 294 176 285 86 93 159 0 5 10 15 20 % of genes fro m whole genome Figure S6: Gene representation in GO categories for (A) Biological Process, (B) Molecular Function and (C) Cellular Component. The figure shows genes from rice chr 1 for each GO category as a percentage of the genes in the whole genome present in that category. The red bar indicates the TP genes while the blue bar indicates all genes. The number of chr 1 genes assigned to each category are mentioned adjacent to each bar. 8

Table S3: Distribution of Arabidopsis TP and FN genes in GO categories as a percentage of the total genes in the genome present in that category. Functional Category Total genes TP: % of Total FN: % of Total Cellular Component extracellular 359 95.265 4.178 cell wall 485 94.639 5.361 other cellular components 2406 93.724 6.359 plasma membrane 1827 93.596 6.568 nucleus 2079 93.362 6.590 plastid 969 92.570 7.946 other membranes 2528 92.366 7.872 Golgi apparatus 207 92.271 9.179 chloroplast 2635 91.879 8.008 ER 348 91.667 7.759 other cytoplasmic components 2696 91.320 8.828 other intracellular components 3415 91.274 8.726 cytosol 664 91.114 9.337 unknown cellular components 5050 91.030 8.772 mitochondria 881 90.919 9.535 ribosome 388 86.856 13.402 Molecular Function transcription factor activity 1188 94.737 5.343 nucleic acid binding 615 94.181 5.360 other enzyme activity 2481 93.764 6.652 other binding 2579 93.748 6.361 hydrolase activity 1700 93.253 6.802 transporter activity 864 93.204 6.580 other molecular functions 998 93.010 6.710 transferase activity 1775 92.641 8.038 nucleotide binding 1443 92.322 7.550 DNA or RNA binding 1428 92.248 8.075 kinase activity 877 92.122 8.298 protein binding 1787 91.735 8.419 receptor binding or activity 118 91.473 6.977 unknown molecular functions 5108 91.443 8.342 structural molecule activity 387 87.755 12.245 Biological Process transcription 1231 94.801 5.280 signal transduction 771 94.034 6.096 other biological processes 1396 93.625 6.734 response to stress 1529 93.525 6.867 response to abiotic or biotic stimulus 1487 93.208 7.061 developmental processes 1386 92.857 7.937 other metabolic processes 6503 92.773 7.335 other cellular processes 6949 92.488 7.541 protein metabolism 2378 92.010 7.906 transport 1364 91.862 7.478 cell organization and biogenesis 976 91.496 7.992 unknown biological processes 6499 91.429 8.340 electron transport or energy pathways 220 91.364 9.545 DNA or RNA metabolism 271 90.775 8.487 9

Table S4: Geneids for orthologous genes selected for comparison Gene Arabidopsis geneids Rice geneids Aspartate aminotransferase At2g30970 Os02g0236000 Cu/Zn superoxide dismutase At5g18100 Os03g0219200 Dof genes At3g52400 Os03g0787000 P-type ATPase At1g07670 Os03g0281600 FAD2 At3g12120 Os02g0716500 PRF1 At2g19760 Os10g0323600 10

References Carbon, S., Ireland, A., Mungall, C. J., Shu, S., Marshall, B., Lewis, S., the AmiGO Hub, and the Web Presence Working Group (2009). AmiGO: online access to ontology and annotation data. Bioinformatics, 25:288 9. Rangannan, V. and Bansal, M. (2009). Relative stability of DNA as a generic criterion for promoter prediction: whole genome annotation of microbial genomes with varying nucleotide base composition. Mol. Biosyst., 5:1758 69. Rangannan, V. and Bansal, M. (2010). High quality annotation of promoter regions for 913 bacterial genomes. Bioinformatics, 26:3043 50. The Gene Ontology Consortium (2000). Gene ontology: tool for the unification of biology. Nat. Genet., 25:25 9. 11