Homology and Information Gathering and Domain Annotation for Proteins

Similar documents
Homology. and. Information Gathering and Domain Annotation for Proteins

CS612 - Algorithms in Bioinformatics

8/23/2014. Phylogeny and the Tree of Life

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

Chapter 26 Phylogeny and the Tree of Life

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Phylogeny and the Tree of Life

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Session 5: Phylogenomics

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Phylogeny and the Tree of Life

Hands-On Nine The PAX6 Gene and Protein

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

What is Phylogenetics

Computational methods for predicting protein-protein interactions

How should we organize the diversity of animal life?

Quantitative Genetics & Evolutionary Genetics

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

Example of Function Prediction

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Phylogeny and the Tree of Life

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

Computational approaches for functional genomics

PHYLOGENY AND SYSTEMATICS

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Orthologs Detection and Applications

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Classification, Phylogeny yand Evolutionary History

Biol478/ August

CSCE555 Bioinformatics. Protein Function Annotation

Lecture 11 Friday, October 21, 2011

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Classification and Phylogeny

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

EBI web resources II: Ensembl and InterPro

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel

Classification and Phylogeny

Phylogeny and the Tree of Life

Prediction of protein function from sequence analysis

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

AP Biology Notes Outline Enduring Understanding 1.B. Big Idea 1: The process of evolution drives the diversity and unity of life.

Macroevolution Part I: Phylogenies

Gene function annotation

MiGA: The Microbial Genome Atlas

Phylogenetic analysis. Characters

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Protein function prediction based on sequence analysis

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Chapter 19: Taxonomy, Systematics, and Phylogeny

Large-Scale Genomic Surveys

Protein Structure: Data Bases and Classification Ingo Ruczinski

Curriculum Links. AQA GCE Biology. AS level

Chapter 26. Phylogeny and the Tree of Life. Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick Pearson Education, Inc.

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Biology 211 (2) Week 1 KEY!

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Chapter 2 Structures. 2.1 Introduction Storing Protein Structures The PDB File Format

Bioinformatics Exercises

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

The Contribution of Bioinformatics to Evolutionary Thought

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Name: Class: Date: ID: A

SUPPLEMENTARY INFORMATION

Open a Word document to record answers to any italicized questions. You will the final document to me at

Heteropolymer. Mostly in regular secondary structure

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

BIOINFORMATICS: An Introduction

PHYLOGENY & THE TREE OF LIFE

16.4 Evidence of Evolution

Chapter 16: Reconstructing and Using Phylogenies

Phylogenetic Tree Reconstruction

Chapter 26: Phylogeny and the Tree of Life

Fig. 26.7a. Biodiversity. 1. Course Outline Outcomes Instructors Text Grading. 2. Course Syllabus. Fig. 26.7b Table

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Chapter 27: Evolutionary Genetics

Tools and Algorithms in Bioinformatics

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

BIOLOGY. Phylogeny and the Tree of Life CAMPBELL. Reece Urry Cain Wasserman Minorsky Jackson

Unit 9: Evolution Guided Reading Questions (80 pts total)

Patterns of Evolution

Microbial Taxonomy and the Evolution of Diversity

Classifications can be based on groupings g within a phylogeny

Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Phylogeny and the Tree of Life

Phylogenetic Analysis

Phylogenetic Analysis

Phylogenetic Analysis

Introduction to protein alignments

3/8/ Complex adaptations. 2. often a novel trait

Transcription:

Homology and Information Gathering and Domain Annotation for Proteins

Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises

The concept of homology The same organ in different animals under every variety of form and function. Richard Owen, 1843 http://bytesizebio.net/wp-content/uploads/2009/07/homology-limbs Homologous forelimbs

Homology Alikeness because of common ancestry Homology: The relationship of any two characters that have descended with divergence from a common ancestral character (common ancestry) Analogy: The relationship of any two characters that have descended convergently from unrelated ancestors (convergent evolution) Characters are at very different levels of biological organization, ranging from entire organs over genes and domains to single nucleotides Homology is a concept of quality (all-or-none) Homology is not precisely defined pterosaur bat bird http://upload.wikimedia.org/wikipedia/commons/3/38/homology.jpg Steven M. Carr, 2009 http://www.mun.ca/biology/scarr/molecular_homology_&_analogy.html

Subtypes of homology Three disjoint subtypes Orthology: Two homologous characters separated by a speciation event Paralogy: Two homologous characters arising from a duplication event Xenology: Two homologous characters whose history involves interspecies (horizontal) transfer of genetic material Horizontal transfer (Speciation) (Duplication) Walter M. Fitch,Trends in Genetics, 2000

Protein domain is a basic evolutionary module and an important unit of homology Definition: A polypeptide chain capable of autonomous folding Many proteins are multi-domain proteins Many domains are found in different contexts domain shuffling Exons in eukaryotic genomes often correspond to domains Therefore, protein classification schemes build on domains not on entire proteins Soding & Lupas, Bioessays, 2003

Assessment of homology in proteins Assessed by comparing their sequence, structure, and function Sequence similarity is the primary marker of homology Due to the relatively minor size of protein structure space, similar structures are more likely to originate by convergence However, structure diverges more slowly and therefore allows for the recognition of more distant relationships Functional residues within an active site are often the most highly conserved positions in a protein sequence Sequence Structure Function

Information gathering and domain annotation for proteins Databases and servers Domain annotation

A variety of databases enable information gathering about your protein of interest Run by different research institutions Allow for free information retrieval for academic purposes The spectrum ranges from broad all-around databases (Uniprot or NCBI) to databases that specialize in particular aspects (i.e. hierarchical structural classification)

The National Center for Biotechnology Information (NCBI) at the National Institute of Health in the US The NCBI advances science and health by providing access to biomedical and genomic information Contains numerous popular resources PubMed (life science literature) Sequences (whole genomes to individual proteins) Gene Expression data Taxonomy Numerous Tools, most importantly BLAST for homology detection A good starting point for an analysis

Protein classifications generate order among their tremendous diversity Sequence-based domain classifications (grouping is based on homology inferred by detectable sequence similarity): SMART: emphasizes on signaling domains, fast Pfam: a comprehensive database to classify newly found domains into domain families Structure-based classification schemes: CATH: Class Architecture Topology Homology SCOP: Structural Classification of Proteins Class Fold Superfamily Family Homology is not a criterion on all levels of classification In contrast to cellular life proteins are polyphyletic

Example 1: Annotate domains in LRRK2 (Human) Obtain sequence in FASTA 1 format from the NCBI 2 Enter name of the protein (LRRK2) in Uniprot 3 and see all the information one can retrieve there Put the sequence into domain databases like SMART 4 or Pfam 5 and mark the identified domains in your log file 1) FASTA: a widely used plain text file format for sequence data 2) NCBI: google ncbi or http://www.ncbi.nlm.nih.gov/ 3) UniProt: google uniprot or http://www.uniprot.org/ 4) SMART: google embl smart or http://smart.embl-heidelberg.de/ 5) Pfam: google pfam or http://pfam.sanger.ac.uk/

Example 2: Annotate domains in NarX (E. coli) 1) FASTA: a widely used plain text file format for sequence data 2) NCBI: google ncbi or http://www.ncbi.nlm.nih.gov/ 3) UniProt: google uniprot or http://www.uniprot.org/ 4) SMART: google embl smart or http://smart.embl-heidelberg.de/ 5) Pfam: google pfam or http://pfam.sanger.ac.uk/