CSCE555 Bioinformatics. Protein Function Annotation

Similar documents
Computational methods for predicting protein-protein interactions

Structure to Function. Molecular Bioinformatics, X3, 2006

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Functional Annotation

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel

BMD645. Integration of Omics

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Gene function annotation

We have: We will: Assembled six genomes Made predictions of most likely gene locations. Add a layers of biological meaning to the sequences

EBI web resources II: Ensembl and InterPro

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Some Problems from Enzyme Families

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Week 10: Homology Modelling (II) - HHpred

Homology and Information Gathering and Domain Annotation for Proteins

Computational approaches for functional genomics

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

-max_target_seqs: maximum number of targets to report

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Computational Molecular Biology (

Prediction of protein function from sequence analysis

Homology. and. Information Gathering and Domain Annotation for Proteins

Introduction to Bioinformatics

In-Silico Approach for Hypothetical Protein Function Prediction

Research Article HomoKinase: A Curated Database of Human Protein Kinases

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

CS612 - Algorithms in Bioinformatics

A Protein Ontology from Large-scale Textmining?

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Gene Ontology and overrepresentation analysis

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Patterns and profiles applications of multiple alignments. Tore Samuelsson March 2013

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

ALL LECTURES IN SB Introduction

Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)

Computational Biology: Basics & Interesting Problems

Types of biological networks. I. Intra-cellurar networks

NRProF: Neural response based protein function prediction algorithm

GOSAP: Gene Ontology Based Semantic Alignment of Biological Pathways

Enzymes I. Dr. Mamoun Ahram Summer semester,

Comparative genomics: Overview & Tools + MUMmer algorithm

proteins Comparison of structure-based and threading-based approaches to protein functional annotation Michal Brylinski, and Jeffrey Skolnick*

CAP 5510 Lecture 3 Protein Structures

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Protein Families. João C. Setubal University of São Paulo Agosto /23/2012 J. C. Setubal

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms

Genome Annotation Project Presentation

C a h p a t p e t r e r 6 E z n y z m y e m s

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Sequence analysis and comparison

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Motif Prediction in Amino Acid Interaction Networks

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

V19 Metabolic Networks - Overview

Protein domains, function and associated prediction. Experimental. Protein function categories. Issue when elucidating function experimentally

V14 extreme pathways

Enzyme Catalysis & Biotechnology

Intracellular Networks

Large-Scale Genomic Surveys

Patrick: An Introduction to Medicinal Chemistry 5e Chapter 01

Sequence Alignment Techniques and Their Uses

Genome-wide multilevel spatial interactome model of rice

Outline Sequence-comparison methods. Buzzzzzzzz. MB330 - The class of 2008

Protein Structure: Data Bases and Classification Ingo Ruczinski

Tools and Algorithms in Bioinformatics

Introduction to Evolutionary Concepts

Understanding Sequence, Structure and Function Relationships and the Resulting Redundancy

Name Period The Control of Gene Expression in Prokaryotes Notes

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

BIOINFORMATICS LAB AP BIOLOGY

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

The CATH Database provides insights into protein structure/function relationships

Exploring Evolution & Bioinformatics

Study of Mining Protein Structural Properties and its Application

Protein structure alignments

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Syllabus BINF Computational Biology Core Course

Biological Pathways Representation by Petri Nets and extension

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Advanced in silico Drug Design KFC/ADD Structural Biology Tools

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11

RNA and Protein Structure Prediction

ATLAS of Biochemistry

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

All Proteins Have a Basic Molecular Formula

BSc and MSc Degree Examinations

Sequence Based Bioinformatics

Transcription:

CSCE555 Bioinformatics Protein Function Annotation

Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007

What s function? The definition of biological function is ambiguous (context dependent) It is obvious that the biological function of a protein has more than one aspect

How to describe function?.. in a computationally amenable way? Human language Controlled vocabulary EC (Enzyme Commission Classification) 1. -. -.- Oxidoreductases. 1. 1. -.- Acting on the CH-OH group of donors. 1. 1. 1.- With NAD(+) or NADP(+) as acceptor. 1.1.1.1 Alcohol dehydrogenase. 1.1.1.3 Homoserine dehydrogenase. GO (Gene Ontology) Check the GO website

Gene Ontology

What information can be used for function annotation? Sequence based approaches Protein A has function X, and protein B is a homolog (ortholog) of protein A; Hence B has function X Structure-based approaches Protein A has structure X, and X has so-so structural features; Hence A s function sites are. Motif-based approaches (sequence motifs, 3D motifs) A group of genes have function X and they all have motif Y; protein A has motif Y; Hence protein A s function might be related to X Guilt-by-association Gene A has function X and gene B is often associated with gene A, B might have function related to X Associations Domain fusion, phylogenetic profiling, PPI, etc. Meta-approaches

Different ways of transferring functions Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007

Annotation transfer by homology Database searching using sequence-based alignment approaches BLAST PSI-BLAST, profile-profile alignment Hmmpfam against Pfam database Significance evaluation in database searching Ortholog / paralog Phylogeny analysis Ortholog -- same function Paralog -- different function

Database for searching Protein family databases Pfam PANTHER: A Library of Protein Families and Subfamilies Indexed by Function (http://www.pantherdb.org/) SEED gene family KEGG gene family etc

Similar vs Orthologous A1 B1 B2 B1

Structure-based function prediction Structure-based methods could possibly detect remote homologues that are not detectable by sequence-based method using structural information in addition to sequence information protein threading (sequence-structure alignment) is a popular method Structure-based methods could provide more than just homology information

Structure-based function prediction Using sequence-structure alignment method, one can predict a protein belongs to a SCOP family / superfamily / fold familiy (same function) superfamily (similar functions) fold (different functions) folds superfamilies families

Motif-based function prediction Sequence motif (pattern) PROSITE (ScanPROSITE): domains/families/functional sites BLOCK Multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins PRINTS Collection of protein fingerprints -- a fingerprint is a group of conserved motifs used to characterize a protein family; more powerful than can single motifs Motif finding -- a well-defined bioinformatics problem Alignment based / alignment independent MEME

PROSITE & ScanPROSITE PROSITE contains patterns specific for more than a thousand protein families. PROSITE Examples PKC_PHOSPHO_SITE, PS00005; Protein kinase C phosphorylation site Consensus pattern: [ST] - x - [RK] S or T is the phosphorylation site URICASE, PS00366; Uricase signature (PATTERN) [LV] - x - [LV] - [LIV] - K - [STV] - [ST] - x - [SN] - x - F - x(2) - [FY] - x(4) - [FY] - x(2) - L - x(5) - R ScanPROSITE -- it allows to scan a protein sequence for occurrence of patterns and profiles stored in PROSITE

Transcription factors and motifs

Function prediction based on local structure patterns 3D motif (spatial patterns of residues) Clefts / pockets (Prediction of ligand binding sites) For ~85% of ligand-binding proteins, the largest largest cleft is the ligand-binding site For additional ~10% of ligand-binding proteins, the second largest cleft is the ligand-binding site

A typical example of 3D motif: catalytic triad A catalytic triad: 3 amino acid residues found inside the active site of certain protease enzymes: serine (S), aspartate (D) and histidine (H). They work together to break peptide bonds on polypeptides. The residues of a catalytic triad can be far from each other in the primary structure, but are brought close together in the tertiary structure.

Local structure pattern resources PINTS -- Patterns In Non-homologous Tertiary Structures (3D motif) http://www.russell.embl.de/pints/ ef-site -- electrostatic-surface of Functional site a database for molecular surfaces of proteins' functional sites, displaying the electrostatic potentials and hydrophobic properties together on the Connolly surfaces of the active sites http://ef-site.protein.osaka-u.ac.jp/ef-site/ Catalytic site atlas http://www.ebi.ac.uk/thornton-srv/databases/csa/

Guilty-by-association Phylogenetic profiling (co-evolution pattern) Protein-protein interaction Domain fusion Genomic context Neighbor genes (operon) / Gene team Gene expression (protein expression level) etc Integration

Phylogenetic profiling approach A non-homologous approach using co-evolution pattern The phylogenetic profile of a protein is a string that encodes the presence (1) or absence (0) of the protein in every sequenced genome (0/1 string) Proteins that participate in a common structural complex or metabolic pathway are likely to coevolve, the phylogenetic profiles of such proteins are often similar Similarity of phylogenetic profiles -- similarity of functionality

Phylogenetic profiling approach Genes with similar phylogenetic profiles have related functions or functionally linked Eisenberg and colleagues (1999)

Sequence co-evolution

Gene (domain) fusion for PPI prediction Gene (domain) fusion is the an effective method for prediction of protein-protein interactions If proteins A and B are homologous to two domains of a protein C, A and B are predicted to interact with each other Rosetta stone methods Genome A Genome B Gene-fusion has low prediction coverage, but it has low false-positive rate Genome C

Genomic-context based approaches Gene cluster

Functional inference at systems level Function prediction of individual genes could be made in the context of biological pathways/networks By doing homologous search, one can map a known biological pathway in one organism to another one; hence predict gene functions in the context of biological pathways/networks Example phob is predicted to be a transcription regulator and it regulates all the genes in the pho-regulon (a group of coregulated operons); and within this regulon, gene A is interacting with gene B, etc.

Pathway reconstruction 1) A general pathway & a list of functional roles 2) Annotations in various species f1 f2 f3 f4 f5 f6 Organism 1 Organism 2 Organism 3 Organism 4

Resource for functional inference at systems level: KEGG KEGG: Kyoto Encyclopedia of Genes and Genomes & KAAS: KEGG Automatic Annotation Server KEGG PATHWAY is a collection of manually drawn pathway maps representing our knowledge on the molecular interaction and reaction networks for: 1. Metabolism 2. Genetic Information Processing 3. Environmental Information Processing 4. Cellular Processes 5. Human Diseases 6. Drug Development

Integration of multiple data sources for function annotation SAMBA framework Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007

References Friedberg. Automated protein function prediction--the genomic challenge. Brief Bioinform. 7(3):225-42. 2006 Sharan et al. Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007