Improving Gene Functional Analysis in Ethylene-induced Leaf Abscission using GO and ProteInOn

Similar documents
Francisco M. Couto Mário J. Silva Pedro Coutinho

BMD645. Integration of Omics

Mixture models for analysing transcriptome and ChIP-chip data

Network by Weighted Graph Mining

4CitySemantics. GIS-Semantic Tool for Urban Intervention Areas

Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis

Integration of functional genomics data

I. Molecules and Cells: Cells are the structural and functional units of life; cellular processes are based on physical and chemical changes.

Predicting Protein Functions and Domain Interactions from Protein Interactions

Dynamic modular architecture of protein-protein interaction networks beyond the dichotomy of date and party hubs

GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón

Virginia Western Community College BIO 101 General Biology I

I. Molecules & Cells. A. Unit One: The Nature of Science. B. Unit Two: The Chemistry of Life. C. Unit Three: The Biology of the Cell.

Compare and contrast the cellular structures and degrees of complexity of prokaryotic and eukaryotic organisms.

Toponym Disambiguation using Ontology-based Semantic Similarity

Discovering molecular pathways from protein interaction and ge

AP BIOLOGY SUMMER ASSIGNMENT

Biology Assessment. Eligible Texas Essential Knowledge and Skills

STAAR Biology Assessment

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms

Inductive logic programming for gene regulation prediction

Data visualization and clustering: an application to gene expression data

Bio 101 General Biology 1

2. Cellular and Molecular Biology

Computational methods for predicting protein-protein interactions

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

A transcriptome meta-analysis identifies the response of plant to stresses. Etienne Delannoy, Rim Zaag, Guillem Rigaill, Marie-Laure Martin-Magniette

Course Descriptions Biology

Miller & Levine Biology 2010

Genome-Scale Gene Function Prediction Using Multiple Sources of High-Throughput Data in Yeast Saccharomyces cerevisiae ABSTRACT

Slide 1 / Describe the setup of Stanley Miller s experiment and the results. What was the significance of his results?

Marine Resources Development Foundation/MarineLab Grades: 9, 10, 11, 12 States: AP Biology Course Description Subjects: Science

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Supplementary Information

Outline. Terminologies and Ontologies. Communication and Computation. Communication. Outline. Terminologies and Vocabularies.

Arabidopsis PPR40 connects abiotic stress responses to mitochondrial electron transport

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species

COMPETENCY GOAL 1: The learner will develop abilities necessary to do and understand scientific inquiry.

Introduction to Bioinformatics

October 08-11, Co-Organizer Dr. S D Samantaray Professor & Head,

An introduction to SYSTEMS BIOLOGY

EASTERN ARIZONA COLLEGE General Biology I

Field 045: Science Life Science Assessment Blueprint

Vance County Early College High School Pacing Guide Course: Introduction to Biology (Semester I)

Total

SPRINGFIELD TECHNICAL COMMUNITY COLLEGE ACADEMIC AFFAIRS

What is Systems Biology

Unit 1: Chemistry of Life Guided Reading Questions (80 pts total)

Nature Structural and Molecular Biology: doi: /nsmb Supplementary Figure 1

Contra Costa College Course Outline

Goal 1: Develop knowledge and understanding of core content in biology

ADVANCED PLACEMENT BIOLOGY

Peddie Summer Day School

BIOLOGY IB DIPLOMA PROGRAM PARENTS ORIENTATION DAY

Supplemental Material

Abiotic Stress in Crop Plants

Text of objective. Investigate and describe the structure and functions of cells including: Cell organelles

86 Part 4 SUMMARY INTRODUCTION

Curriculum Map. Biology, Quarter 1 Big Ideas: From Molecules to Organisms: Structures and Processes (BIO1.LS1)

MINISYMPOSIUM LOGICAL MODELLING OF (MULTI)CELLULAR NETWORKS

Miller & Levine Biology

VCE BIOLOGY Relationship between the key knowledge and key skills of the Study Design and the Study Design

Modesto Junior College Course Outline of Record BIO 101

GACE Biology Assessment Test I (026) Curriculum Crosswalk

Area of Focus: Biology. Learning Objective 1: Describe the structure and function of organs. Pre-Learning Evaluation: Teaching Methods and Process:

Introduction to Biology

Name Date Period Unit 1 Basic Biological Principles 1. What are the 7 characteristics of life?

CATEGORY a TERM COUNT b P VALUE GENE FAMILIES: REPRESENTATIVE GENE SYMBOLS c. Annotation Cluster 1 Enrichment Score: 1.

Local topological signatures for network-based prediction of biological function

Utilizing Illumina high-throughput sequencing technology to gain insights into small RNA biogenesis and function

Ph.D. thesis. Study of proline accumulation and transcriptional regulation of genes involved in this process in Arabidopsis thaliana

CHAPTER 1 Life: Biological Principles and the Science of Zoology

2. What properties or characteristics distinguish living organisms? Substance Description Example(s)

Gene Ontology and overrepresentation analysis

Computational Structural Bioinformatics

Supplementary Materials for R3P-Loc Web-server

2012 Biology 1 End-of-Course (EOC) Assessment Form 1

Identify stages of plant life cycle Botany Oral/written pres, exams

Measuring Semantic Similarity between Gene Ontology Terms

Function Prediction Using Neighborhood Patterns

Dynamical Modeling in Biology: a semiotic perspective. Junior Barrera BIOINFO-USP

Mixtures and Hidden Markov Models for analyzing genomic data

Valley Central School District 944 State Route 17K Montgomery, NY Telephone Number: (845) ext Fax Number: (845)

CSCE555 Bioinformatics. Protein Function Annotation

Information Extraction from Biomedical Text. BMI/CS 776 Mark Craven

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

Supplementary Information 16

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

Gabriella Rustici EMBL-EBI

Context dependent visualization of protein function

Lowndes County Biology II Pacing Guide Approximate

Miller & Levine Biology 2014

Lecture 3: A basic statistical concept

Model Worksheet Student Handout

Chetek-Weyerhaeuser High School

1. CHEMISTRY OF LIFE. Tutorial Outline

Introduction to the Study of Life

Administrative - Master Syllabus COVER SHEET

Unit # - Title Intro to Biology Unit 1 - Scientific Method Unit 2 - Chemistry

Transcription:

Improving Gene Functional Analysis in Ethylene-induced Leaf Abscission using GO and ProteInOn Sara Domingos 1, Cátia Pesquita 2, Francisco M. Couto 2, Luis F. Goulao 3, Cristina Oliveira 1 1 Instituto Superior de Agronomia, Universidade Técnica de Lisboa, Tapada da Ajuda, 1349-017 Lisboa, Portugal saradomingos@gmail.com, crismoniz@isa.utl.pt 2 Faculdade de Ciências, Universidade de Lisboa, Campo Grande, 1749 016 Lisboa, Portugal cpesquita@xldb.di.fc.ul.pt, fcouto@di.fc.ul.pt 3 Eco-Bio, Instituto de Investigação Científica Tropical IP, Av. da República, Quinta do Marquês,2784-505 Oeiras, Portugal goulao@iict.pt Abstract. In this study we apply a new strategy to select candidate genes involved in the abscission of citrus leaves based on the identification of the most meaningful functional aspects of differentially expressed genes using ProteInOn, a Gene Ontology (GO) based web tool. We submitted a previously analyzed dataset obtained from a microarray experiment to the tool, to obtain the most meaningful GO terms annotated to the given gene products, highlighting the relevant functional processes involved with a greater detail than the original analysis and supporting identification of the genes more likely to be involved in the studied phenomenon. This strategy proved to be effective by complementing the previous analysis with a wider knowledge about the biological processes and gene products more relevant to ethylene-induced leaf abscission. Keywords: ethylene-induced leaf abscission, Citrus clementina, functional analysis, meaningful terms, semantic similarity. 1 Introduction Abscission is a cell separation process by which plant organs are detached, by reduction of cell adhesion and cell wall disassembly in specialized cells, named the abscission zone, and represents an important agricultural phenomenon which affects final yield [1]. The accumulation of data generated by high-throughput techniques of gene expression analysis requires bioinformatics tools and defined vocabularies to describe genes functions that allow its integration and computation. This study aims to improve functional analysis of a list of differentially expressed genes, using the Gene Ontology (GO) and the ProteInOn web tool. To perform it, we

used microarray experiment results obtained in [1] where a functional categorization from MIPS (Munich Information Center for Protein Sequences) was applied. MIPS functional catalogue database (FunCat) provides hierarchical information derived from a few species [2] assigning a p-value for each category. This requires translation into Arabidopsis thaliana orthologues. Nowadays, functional categorization of transcripts is mostly based on GO which provides a controlled vocabulary to describe protein function for all eukaryotes, based on 3 orthogonal ontologies: biological processes, cellular components and molecular functions [3]. To analyze the gene set, we chose ProteInOn, a GO-based web tool focused on identifying the most meaningful functional aspects of related gene products [4]. It uses the hierarchical structure of GO to find representative terms, while typical enrichment strategies use direct annotations and usually fail to find general but meaningful annotations. ProteInOn also measures the specificity of each GO term based on information content (IC) and calculates semantic similarity (SSM) which is a function that, given two sets of terms annotating two entities, returns a value reflecting the closeness in meaning between them. There are more than other 68 enrichment analysis tools available (e.g. Onto-Express, FatiGO), based on the premise that co-functioning genes should have a higher potential to be selected, if a biological process is irregular in a study [6]. Our main goal was to apply the functional analysis strategy detailed in [7] and to compare these results with the original ones, while comparing different gene products functional classifications, GO and FunCat. By using updated databases and bioinformatics tools to analyze previously results, we aim to provide insight into the transcriptome abscission-related to support further experiments. 2 Methods The data set used for functional analysis was obtained from [1]. The authors aimed to identify and classify genes involved in the ethylene-induced leaf abscission process in Citrus clementina, and used a cdna microarray to analyze transcriptome of laminar abscission zone (LAZ) and petiole of leaf explants (Pet). The functional analysis strategy applied here was as follows: i) A list of genes was organized according to moment and organ where genes were over-represented [1] and translated into UniProtKB accession numbers. ii) GO annotations were analyzed through ProteInOn, using UniProtKB accession numbers as input. For genes absent in UniProtKB database, the code of a putative A.th orthologue was used. iii) The ten most meaningful GO terms were selected. Meaningfulness was given by the term representativity measure (e-value) and calculated using the global

frequency of each GO term annotation in the database as an estimator of its probability of occurrence. For each GO term, a protein taken randomly from the dataset is a random event with two outcomes: success if annotated to t, or failure. The probability of observing at least k successes in a random set of n gene products is given by the probability of success P(t) (1). IC values were calculated using (2), where f(t) is the frequency of annotation of the term t [7]. n P (x1 k) = Σ ( i n ) P(t) I (1-p(t)) n-i. i=k (1) IC(t) = log 2 f(t). iv. The SSM between selected gene products was calculated with simgic measure [5], which has been shown to outperform other measures. SimGIC is given by the sum of the IC of each shared term by two gene products, A and B, divided by the sum of the IC of all their terms (3): (2) simgic (A,B) = Σ te{go(a) GO(B)} IC(t) Σ te{go(a)ugo(b)}ic(t). (3) v. Gene products with SSM > 80% were clustered and the remaining ones were analyzed. vi. GO annotations for both the clusters and relevant individual gene products were analyzed and compared with their previous MIPS classification. 3 Results The list of differentially expressed genes in LAZ and Pet was divided by the two main moments after ethylene treatment (Supplementary Data 1). In the first moment (6h and 12h after treatment) ten differently expressed genes were found in the LAZ. In the second moment (24h and 36 h after treatment) 23 over-represented genes were found in the LAZ and 31 in the Pet, three of them in common with the first phase. The initial set was composed by 61 gene products of which 56 had UniProtKB accession number, 52 had annotations on GO, of which 84% were inferred from electronic annotations. The authors of the experimental assay associated each over-represented transcript to the matching gene, first identified in a single species, which we used as input in UniProtKB, instead of finding the A.th orthologue to use MIPS [1]. From the term representativity analysis we obtained a list of relevant GO terms ranked by e-value. Figure 1 and Supplementary Data 2 show the ten most meaningful biological process terms of each set, since it is the most interesting GO type for the identification of the relevant genes in a biological phenomenon. For the LAZ, in the

first moment, the most meaningful GO terms had a low IC, but in the second moment we found annotations with a higher specificity, such as lipid transport and cell wall organization (Supplementary Data 2). Figure 1 shows the ten most meaningful terms of over-represented gene products in the Pet, which have a high IC and relevant biologic meaning for the abscission process, as chitin and other cell wall molecular catabolic processes, cell wall biosynthesis, polysaccharide metabolism, response to stress and oxidative stress and response to stimulus and chemical stimulus. We calculated the SSM for the subset of gene products that are annotated with these meaningful GO terms and built clusters of gene products with SSM > 80% (Table 1). Relevant GO terms associated to remaining gene products were also selected (Supplementary Data 3). Then the annotations of the selected gene products were analyzed (Table 2). In the LAZ, in the first phase, we found gene products involved in photosynthetic processes, response to biotic and abiotic stimulus, protein metabolism, lipid metabolism and floral organ abscission, while in the second phase gene products were related to carbohydrate and protein metabolism, molecular transport and cell wall organization (Tables 1, 2 and Supplementary Data 3). In the PET, there was more processes involved (Supplementary Data 4): DNA transcription, cell wall organization and biosynthesis, response to abiotic stress, oxidation-reduction, hormonal regulation, protein metabolism, molecular catabolism and transport. 10 8 6 4 2 0 Frequency of the 10 most meaningful GO terms Term Id Term Name IC GO:0006032 chitin catabolic process 0.45 GO:0006979 response to oxidative stress 0.37 GO:0016998 cell wall macromolecule catabolic p. 0.38 GO:0042221 response to chemical stimulus 0.25 GO:0071554 cell wall organization or biogenesis 0.27 GO:0005976 polysaccharide metabolic process 0.24 GO:0006950 response to stress 0.19 GO:0050896 response to stimulus 0.16 GO:0009056 catabolic process 0.20 GO:0070887 cellular response to chemical stimulus 0.39 Fig. 1. Ten most meaningful GO terms and corresponding IC, of biological process type, annotated to gene products over-represented in Pet, in the second moment (Source: ProteInOn). Table 1. Clusters of gene products over-represented in LAZ e Pet. LAZ/ second moment Pet/ second moment Clusters of gene products (Uniprot ac. number) Term name Q6EV47, Q8L5S8 lipid transport Q81746, Q94C86 cellular cell wall organization A0MKC8, Q8L4F2, Q9SF40 translation O82547, Q43752, Q8H985, Q8H986 A9PHA0, A9UFX7, Q0ZA67 chitin catabolic process cell wall macromolecule catabolic proc. response to oxidative stress oxidation-reduction process

Table 2. Functional characterization of isolated gene products over-represented in LAZ, in the first moment after ethylene treatment, of biological process GO terms type. LAZ/ first moment Term Name O 8 0 3 9 7 regulation of stomatal complex patterning regulation of stomatal complex development floral organ abscission jasmonic acid metabolic process plant-type hypersensitive response defense response, incompatible interaction oxylipin biosynthetic process protein phosphorylation response to wounding response to stress Q 8 L 6 Y 9 Q 9 A R 0 7 4 Discussion and Conclusion Using the current versions of UniProtKB and GO annotations, we obtained an improvement on the coverage of the initial gene list with only four terms without GO annotations associated comparing with MIPS categorization [1]. Over-represented gene products with manual annotations were 15.38% of the transcripts, which probably means that the studied phenomenon has been gaining interest. By selecting the clusters of gene products with high SSM and the relevant isolated gene products, we found gene products with an interesting set of annotations for ethylene-induced leaf abscission. Analyzing this restricted list, for example in the Pet, in the second moment after ethylene treatment (Supplementary Data 4) we found that they are associated with other important processes such as hormonal regulation, molecular transport, DNA transcription, protein metabolism, aromatic compounds metabolism and organ formation. In the last phase of abscission, a protective layer is generated in the organ that remains attached to the plant [1], which can be related with the gene product annotated with organ formation term. MIPS categorization indicated that transcriptional activation during abscission was involved in defense, transport mechanisms, signal transduction and protein biosynthesis, comparing it with our strategy we obtained a more comprehensive characterization, including cell wall biosynthesis, hormonal regulation and biosynthesis, protein metabolism, catabolic process, molecular transport and response to biotic/abiotic stresses, and other detailed information. The adopted strategy using ProteInOn, with the empirical choice of the ten most meaningful terms and clustering based on SSM > 80%, proved to be effective in providing a wider understanding of the process under study. A gene product that seems to have a key role is ATMKK4 (O80397) which is annotated with very relevant terms: floral organ abscission, stomatal complex patterning and

development regulation, plant-type hypersensitive response, defense, response to stress and protein phosphorylation. For this specific gene product, and to account for the bias in basing our analysis in a GO 2011 version and comparing it to results obtained with a MIPS 2008 version, we investigated its functional description in MIPS 2011 and GO 2008. In MIPS 2011 it is related to defense, response to biotic stimulus, protein phosporylation and protein kinase function, while in GO 2008 it is annotated with sensitive response, protein phosphorylation and defense. Hence, GO did gain relevant information about this gene since 2008, while MIPS did not. By reanalyzing previously obtained results with more up to date bioinformatics tools and databases, we were able to improve the functional analysis of differentially expressed genes and the identification of gene products relevant to the studied phenomenon. This strategy can be an important tool to guide future experiments, saving time and materials by avoiding experiment replication. Supplementary Data. Supplementary data are available at http://xldb.fc.ul.pt/xldb/images/e/e2/supplementary_data.pdf. Acknowldegements. This work was supported by the Portuguese Fundação para a Ciência e Tecnologia through the Multiannual Funding Programme for LaSIGE, and the PhD grants SFRH/BD/42481/2007 and SFRH/BD/69076/2010. 5 References 1. Agusti, J., Merelo, P., Cercós, M., Tadeo, F.R., Talón, M.: Ethylene-induced Differential Gene Expression during Abscission of Citrus Leaves. J. Exp. Bot. 59, 2717-33 (2008) 2. Ruepp, A., Zollner, A., Maier, D., Albermann, K., Hani, J., Mokrejs, M., Tetko, I., Guldener, U., Mannhaupt, G., Munsterkotter, M., Mewes, H.W.: The FunCat, a Functional Annotation Scheme for Systematic Classification of Proteins from Whole Genomes. Nucleic Acids Res. 32, 5539-45 (2004) 3. The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat. Genet. 25(1):25-29 (2000) 4. Faria, D., Pesquita, C., Couto, F., Falcao, A.,: ProteInOn: A Web Tool for Protein Semantic Similarity. DI/FCUL TR 07-6, Department of Informatics, University of Lisbon (2007) 5. Pesquita C, Faria D, Bastos H, Falcão AO, Couto F.: Evaluating GO-based Semantic Similarity Measures. ISMB/ECCB 2007 SIG Meeting Program Materials, International Society for Computational Biology (2007) 6. Huang, daw., Sherman, B.T., Lempicki, R.A.: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37,1-13 (2009) 7. Bastos, H., Tavares, B., Pesquita, C., Faria, D., Couto F.: Application of Gene Ontology to Gene Identification. In: Yu, B., Hinchcliffe, M., (eds) In Silico Tools for Gene Discovery. Springer Science (in press)