Homology. and. Information Gathering and Domain Annotation for Proteins

Similar documents
Homology and Information Gathering and Domain Annotation for Proteins

CS612 - Algorithms in Bioinformatics

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

8/23/2014. Phylogeny and the Tree of Life

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Hands-On Nine The PAX6 Gene and Protein

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Session 5: Phylogenomics

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Chapter 26 Phylogeny and the Tree of Life

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

What is Phylogenetics

EBI web resources II: Ensembl and InterPro

Bioinformatics: Investigating Molecular/Biochemical Evidence for Evolution

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

How should we organize the diversity of animal life?

Orthologs Detection and Applications

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

Protein function prediction based on sequence analysis

Example of Function Prediction

Phylogeny and the Tree of Life

Phylogeny and the Tree of Life

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

CSCE555 Bioinformatics. Protein Function Annotation

Introduction to Bioinformatics Online Course: IBT

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Chapter 19: Taxonomy, Systematics, and Phylogeny

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

BIOINFORMATICS: An Introduction

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

Computational methods for predicting protein-protein interactions

Computational approaches for functional genomics

Chapter 2 Structures. 2.1 Introduction Storing Protein Structures The PDB File Format

Large-Scale Genomic Surveys

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Protein Structure: Data Bases and Classification Ingo Ruczinski

Chapter 26. Phylogeny and the Tree of Life. Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick Pearson Education, Inc.

Gene function annotation

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

AP Biology Notes Outline Enduring Understanding 1.B. Big Idea 1: The process of evolution drives the diversity and unity of life.

Bioinformatics Exercises

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Functional Annotation

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Biol478/ August

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Quantitative Genetics & Evolutionary Genetics

Phylogeny and the Tree of Life

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

PHYLOGENY AND SYSTEMATICS

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

SUPPLEMENTARY INFORMATION

The CATH Database provides insights into protein structure/function relationships

Chapter 16: Reconstructing and Using Phylogenies

MiGA: The Microbial Genome Atlas

Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

The Contribution of Bioinformatics to Evolutionary Thought

Phylogenetic analysis. Characters

Classification, Phylogeny yand Evolutionary History

Phylogenetic Tree Reconstruction

Macroevolution Part I: Phylogenies

Annotation Error in Public Databases ALEXANDRA SCHNOES UNIVERSITY OF CALIFORNIA, SAN FRANCISCO OCTOBER 25, 2010

Phylogeny and the Tree of Life

Open a Word document to record answers to any italicized questions. You will the final document to me at

Heteropolymer. Mostly in regular secondary structure

Classification and Phylogeny

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Phylogenetic trees 07/10/13

1/17/2012. Class Aves. Avian Systematics. Avian Systematics. Subclass Sauriurae

Organizing Life s Diversity

Prediction of protein function from sequence analysis

Classification and Phylogeny

16.4 Evidence of Evolution

Update on human genome completion and annotations: Protein information resource

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

A (short) introduction to phylogenetics

Curriculum Links. AQA GCE Biology. AS level

Chapter 26: Phylogeny and the Tree of Life

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço


A Protein Ontology from Large-scale Textmining?

Chapter 27: Evolutionary Genetics

Week 10: Homology Modelling (II) - HHpred

Microbial Taxonomy and the Evolution of Diversity

Transcription:

Homology and Information Gathering and Domain Annotation for Proteins

Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES

Homology or Why knowledge transfer between organisms works

The concept of homology The same organ in different animals under every variety of form and function. Richard Owen, 1843 Homologous forelimbs http://bytesizebio.net/wp-content/uploads/2009/07/homology-limbs

Homology Alikeness because of common ancestry HOMOLOGY The relationship of any two characters that have descended with divergence from a common ancestral character ANALOGY The relationship of any two characters that have descended convergently from unrelated ancestors pterosaur bat bird http://upload.wikimedia.org/wikipedia/commons/3/38/homology.jpg CHARACTERS Are on very different levels of biological organization, e.g. entire organs, genes, domains, single nucleotides Homology is a concept of quality (all-or-none) Steven M. Carr, 2009 http://www.mun.ca/biology/scarr/molecular_homology_&_analogy.html

Three homology subtypes ORTHOLOGY Two homologous characters separated by a speciation event PARALOGY Two homologous characters arising from a duplication event XENOLOGY Two homologous characters whose history involves inter-species (horizontal) transfer of genetic material Horizontal transfer (Speciation) (Duplication) Walter M. Fitch, Trends in Genetics, 2000

Protein domains as evolutionary modules and homology units PROTEIN DOMAIN A polypeptide chain capable of autonomous folding. Many proteins comprise multiple domains Many domains are found in different contexts domain shuffling Most classification schemes build on domains not on entire proteins Söding & Lupas, Bioessays, 2003

Homology assessment in proteins is similarity based SEQUENCE SIMILARITY The primary marker of homology as sequence constantly changes STRUCTURAL SIMILARITY Similar structures are more likely to originate by convergence, due to the relatively minor size of protein structure space But structure diverges slower and thus helps to recognize more distant relationships FUNCTIONAL RESIDUES Found within active sites are often the most highly conserved positions in a protein sequence 1. Sequence 2. Structure 3. Function

Homology what s in it for me? Works across species borders The rationale behind using model organisms Transfer knowledge between proteins A good starting point before any experiment Improved experimental results E.g. improve thermostability by using homolog from thermophilic organism

Protein databases and analysis servers or How to exploit existing knowledge

Current knowledge on proteins in online databases Offered by different research institutions Free information retrieval for academic purposes From broad all-around databases (e.g. Uniprot and NCBI) to databases specialized in particular aspects (e.g. hierarchical structural classification)

The National Center for Biotechnology Information (NCBI) The NCBI advances science and health by providing access to biomedical and genomic information. www.ncbi.nlm.nih.gov numerous popular resources PubMed (life science literature) Sequences (whole genomes to individual proteins) Gene Expression data Taxonomy Numerous tools, most importantly BLAST for homology detection A good starting point for an analysis

The Universal Protein Resource (UniProt) The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information. www.uniprot.org UniProtKB (knowledge base) Swiss-Prot TrEMBL manually annotated and reviewed automatically annotated and is not reviewed

Classifications order proteins to tame their tremendous diversity SEQUENCE-BASED grouping is based on homology inferred by detectable sequence similarity Pfam comprehensive database classifying newly found domains into families SMART annotation of known domains in proteins STRUCTURE-BASED grouping mixes structural features (i.e. analogy) and homology CATH SCOP Class Architecture Topology Homology Structural Classification Of Proteins Class Fold Superfamily Family

Example: Annotate domains in LRRK2 (Human) 1. Enter name of the protein (LRRK2) in UniProt 1 and explore the retrieved information 2. Obtain sequence in FASTA 2 format from the NCBI 3 3. Put the sequence into domain databases like SMART 4 or Pfam 5 and mark the identified domains in your log file 1) UniProt google uniprot or www.uniprot.org 2) FASTA a widely used plain text file format for sequence data 3) NCBI google ncbi or www.ncbi.nlm.nih.gov 4) SMART google embl smart or smart.embl-heidelberg.de 5) Pfam google pfam or pfam.sanger.ac.uk

Exercise: Annotate domains in NarX (E. coli) 1. 1) UniProt google uniprot or www.uniprot.org 2) FASTA a widely used plain text file format for sequence data 3) NCBI google ncbi or www.ncbi.nlm.nih.gov 4) SMART google embl smart or smart.embl-heidelberg.de 5) Pfam google pfam or pfam.sanger.ac.uk