Framework for a Protein Ontology

Similar documents
The Protein Ontology: An Evolution

Framework for a Protein Ontology

Protein Ontology (PRO)

S1 Gene ontology (GO) analysis of the network alignment results

Biol403 - Receptor Serine/Threonine Kinases

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

AP Biology Essential Knowledge Cards BIG IDEA 1

Supplementary Information 16

GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón

Protein Architecture V: Evolution, Function & Classification. Lecture 9: Amino acid use units. Caveat: collagen is a. Margaret A. Daugherty.

Life Sciences 1a: Section 3B. The cell division cycle Objectives Understand the challenges to producing genetically identical daughter cells

BMD645. Integration of Omics

Types of biological networks. I. Intra-cellurar networks

CSCE555 Bioinformatics. Protein Function Annotation

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

Introduction to Bioinformatics

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Advanced Higher Biology. Unit 1- Cells and Proteins 2c) Membrane Proteins

Gene Ontology. Shifra Ben-Dor. Weizmann Institute of Science

Biology I Fall Semester Exam Review 2014

Computational Structural Bioinformatics

Number of questions TEK (Learning Target) Biomolecules & Enzymes

Computational Biology: Basics & Interesting Problems

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

Outline. Evolution: Speciation and More Evidence. Key Concepts: Evolution is a FACT. 1. Key concepts 2. Speciation 3. More evidence 4.

Gene function annotation

Molecular evolution - Part 1. Pawan Dhar BII

Eukaryotic vs. Prokaryotic genes

Multiple Choice Review- Eukaryotic Gene Expression

Homology and Information Gathering and Domain Annotation for Proteins

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

AP Biology Gene Regulation and Development Review

1

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Valley Central School District 944 State Route 17K Montgomery, NY Telephone Number: (845) ext Fax Number: (845)

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Signaling to the Nucleus by an L-type Calcium Channel- Calmodulin Complex Through the MAP Kinase Pathway

Activation of a receptor. Assembly of the complex

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Essential knowledge 1.A.2: Natural selection

VCE BIOLOGY Relationship between the key knowledge and key skills of the Study Design and the Study Design

Translation Part 2 of Protein Synthesis

A A A A B B1

Computational methods for predicting protein-protein interactions

Introduction to Molecular and Cell Biology

BME 5742 Biosystems Modeling and Control

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

A Protein Ontology from Large-scale Textmining?

Compare and contrast the cellular structures and degrees of complexity of prokaryotic and eukaryotic organisms.

AP Biology Curriculum Framework

Information Extraction from Biomedical Text. BMI/CS 776 Mark Craven

Sequence Alignment Techniques and Their Uses

11/24/13. Science, then, and now. Computational Structural Bioinformatics. Learning curve. ECS129 Instructor: Patrice Koehl

Reproduction- passing genetic information to the next generation

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

Graph Alignment and Biological Networks

Cladistics and Bioinformatics Questions 2013

Big Idea 3: Living systems store, retrieve, transmit, and respond to information essential to life processes.

Text of objective. Investigate and describe the structure and functions of cells including: Cell organelles

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

AP Curriculum Framework with Learning Objectives

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

Cell Cell Communication in Development

Analysis and visualization of protein-protein interactions. Olga Vitek Assistant Professor Statistics and Computer Science

Prentice Hall. Biology: Concepts and Connections, 6th edition 2009, (Campbell, et. al) High School

Map of AP-Aligned Bio-Rad Kits with Learning Objectives

Chromosomes. Which chromosomes in a nucleus are similar? Which are identical? Which are completely different?

Research Article HomoKinase: A Curated Database of Human Protein Kinases

COMPETENCY GOAL 1: The learner will develop abilities necessary to do and understand scientific inquiry.

Full file at CHAPTER 2 Genetics

Bioinformatics Exercises

Gene Control Mechanisms at Transcription and Translation Levels

BIOLOGY STANDARDS BASED RUBRIC

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Multiple Sequence Alignment. Sequences

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

SPA for quantitative analysis: Lecture 6 Modelling Biological Processes

Genomics and bioinformatics summary. Finding genes -- computer searches

Lipniacki 2004 Ground Truth

Mathematical Modeling and Analysis of Crosstalk between MAPK Pathway and Smad-Dependent TGF-β Signal Transduction

UE Praktikum Bioinformatik

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Sequence analysis and comparison

Sig2GRN: A Software Tool Linking Signaling Pathway with Gene Regulatory Network for Dynamic Simulation

Cell-Cell Communication in Development

Biology 211 (2) Week 1 KEY!

1. In most cases, genes code for and it is that

Richik N. Ghosh, Linnette Grove, and Oleg Lapets ASSAY and Drug Development Technologies 2004, 2:

Killingly Public Schools. Grade 10 Draft: March 2004

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Field 045: Science Life Science Assessment Blueprint

ADVANCED PLACEMENT BIOLOGY

AP* Biology Prep Course

Enduring understanding 1.A: Change in the genetic makeup of a population over time is evolution.

Transport between cytosol and nucleus

Evolution Unit: What is Evolution?

Transcription:

Framework for a rotein Ontology TMBIO November 2006 Darren A. Natale, h.d. rotein Science Team Lead, IR Research Assistant rofessor, GUMC

GO: ontologies that pertain, in part, to the locations, the processes, and the functions of proteins SI-MOD: ontology that describe the possible modifications to protein amino acid residues SO: ontology that can describe the possible causes of protein sequence, expression, or structure changes DO: ontology that can describe the possible effects of protein sequence, expression, or structure changes

Mothers against decapentaplegic homolog 2 GO annotation of SMAD2_HUMAN: Cellular Component: - nucleus Molecular Function: - protein binding Biological rocess: - signal transduction - regulation of transcription, DNA-dependent

TGF-β II I TGF-beta receptor 1 phosphorylation Smad 4 2 complex formation Smad 4 3 nuclear translocation Nucleus Smad 4 4 DNA binding Transcription Regulation

TGF-β II I TGF-beta receptor 1 phosphorylation Smad 4 ERK1 2 complex formation Smad 4 3 nuclear translocation Nucleus Smad 4 4 DNA binding Transcription Regulation ++

CAMK2 TGF-β II I TGF-beta receptor 1 phosphorylation Smad 4 2 complex formation Smad 4 3 nuclear translocation Cytoplasm Nucleus Transcription Regulation

normal Cytoplasmic RO:00000011 SMAD2_HUMAN TGF-β receptor phosphorylated Forms complex Nuclear Txn upregulation RO:00000013 SMAD2_HUMAN ERK1 phosphorylated Forms complex Nuclear Txn upregulation++ RO:00000014 SMAD2_HUMAN CAMK2 phosphorylated Forms complex Cytoplasmic No Txn upregulation RO:00000015 SMAD2_HUMAN alternatively spliced short form Cytoplasmic RO:00000016 SMAD2_HUMAN phosphorylated short form Nuclear Txn upregulation RO:00000018 SMAD2_HUMAN x point mutation (causative agent: large intestine carcinoma) Doesn t form complex Cytoplasmic No Txn upregulation RO:00000019 SMAD2_HUMAN

Important Considerations Need to consider the various forms a protein might take Need to provide connections to established ontologies Need to account for the possibility that a protein might not share the traits of its parent or siblings

%RO:00000010 Smad2 <RO:00000011 Smad2 sequence 1 (long form) >RO:00000012 Smad2 sequence 1 phosphorylated form %RO:00000013 Smad2 sequence 1, TGF-β receptor I-phosphorylated %RO:00000014 Smad2 sequence 1, TGF-β receptor I and ERK1-phosphorylated has_modification MOD:O-phosphorylated L-serine has_modification MOD:O-phosphorylated L-threonine has_function GO: TGF-β receptor, pathway-specific cytoplasmic mediator activity has_function GO:SMAD binding has_function GO:transcription coactivator activity participates_in GO:signal transduction participates_in GO:SMAD protein heteromerization participates_in GO:regulation of transcription, DNA-dependent located_in GO:nucleus part_of GO:transcription factor complex %RO:00000015 Smad2 sequence 1, TGF-β receptor I and CAMK2-phosphorylated <RO:00000016 Smad2 sequence 2 (short form) - splice variant >RO:00000017 Smad2 sequence 2 phosphorylated form %RO:00000018 Smad2 sequence 2, TGF-β receptor I-phosphorylated <RO:00000019 Smad2 sequence 3 - genetic variant related to colorectal carcinoma has_agent SO: amino_acid_substitution lacks_modification MOD: phosphorylated residue lacks_function GO: transcription coactivator activity agent_of DO: carcinoma of the large intestine % is_a < variant_of > derives_from

Important Considerations Need to consider the various forms a protein might take Need to provide connections to established ontologies Need to account for the possibility that a protein might not share the traits of its parent or siblings Need to take advantage of model organism data to generate hypotheses about human biology

Implications of rotein Evolution Human Chimp Mouse Rat Fly Worm Yeast E.coli B.subtilis Conclusions from experiments performed on proteins from one organism are often applicable to the homologous protein from another organism. Information learned about existing proteins allows us to infer the properties of ancestral proteins.

rotein Evolution Sequence changes Domain shuffling With enough similarity, one can trace back to a common origin What about these?

Levels of rotein Classification Evolutionary Divergence End-to-End Alignment Example Conservation Database Very Ancient Domain structure Very general biochemical activity SCO Superfamily Ancient Domain sequence CVSGSGNNTSITATGGVVDLQSSSAVKVRSTK CYKSG---IQVRLGEDNINVVEGNEQFISASK General biochemical activity fam *....:. :::... : ::* Recent rotein sequence Specific biochemical activity IRSF Domain architecture Functionally Specialized rotein sequence Domain CYKSRIQVRLGEHNIDVLEGNEQFINAAKIIT CYKSGIQVRLGEDNINVVEGNEQFISASKSIV **** *******.**:*:*******.*:* *. Specific biological function ANTHER architecture

TGM3 & EB42 TGM3 = rotein-glutamine gamma-glutamyltransferase (Transglutaminase; involved in protein modification) EB42 = Erythrocyte membrane protein band 4.2 (Constituent of cytoskeleton; involved in cell shape) Q: Are these related? Q: What is known? Q: How to capture?

Gene Duplication (TGM3/EB42 split) TGM3 branch EB42 branch Speciation (Human/mouse split) Human Mouse Human Mouse TGM3 (Human) TGM3 (Mouse) EB42 (Human) EB42 (Mouse) TGM3 (Human) TGM3 (Mouse) EB42 (Human) EB42 (Mouse) GO:0008360 cell shape (biological process) GO:0005200 constituent of cytoskelton (molecular function) has_ancestral_property has_ancestral_property TGM3 (Human) EB42 (Human) IRSF000459 TGM3 (Mouse) EB42 (Mouse) THR11590:SF11 THR11590:SF5 lacks_ancestral_property lacks_ancestral_property has_ancestral_property has_ancestral_property has_part GO:0006464 rotein Modification (biological process) F00868 (domain) F01841 (domain) F00927 (domain) GO:0003810 TGase activity (molecular function)

RO Root level Unit Level The two types of evolutionary units Not substituted by any other terms Domain Family Level (structure) Related by structural similarity Source: SCO Superfamily Domain Family Level (sequence) Related by sequence similarity Source: fam domain is_a domain is_a structure domain is_a sequence domain rotein Family Level Evolutionarily-related full-length proteins May contain finer-grain sub-categories Sources: IRSF family/subfamily, anther subfamily rotein Modification Level rotein as modified after translation Source: UnirotKB evolutionary unit has_part is_a lacks rotein Sequence Level ossible sequence forms derived from genetic variation or splicing Source: UnirotKB is_a protein is_a whole protein is_a sequence form derives_from cleaved and/or modified product GO Gene Ontology molecular function has_ancestral_property has_function lacks_function biological process has_ancestral_property participates_in cellular component has_ancestral_property part_of (for complexes) located_in (for compartments) DO/UMLS SO disease agent_of SI-MOD protein modification has_modification Disease Sequence Ontology sequence changes has_agent (sequence change) agent_of (effect on function) Modification

RO Team (so far ) rinciple Investigators Cathy Wu Judith Blake Hongfang Liu Barry Smith Curators Darren Natale Cecilia Arighi Winona Barker Zhang-zhi Hu (IR at GUMC) (The Jackson Laboratory) (GUMC) (SUNY Buffalo) (IR at GUMC) (IR at GUMC) (IR at GUMC) (IR at GUMC)