Multiple Alignment using Hydrophobic Clusters : a tool to align and identify distantly related proteins

Similar documents
Week 10: Homology Modelling (II) - HHpred

Quantifying sequence similarity

Substitution matrices

Sequence analysis and comparison

Large-Scale Genomic Surveys

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Basic Local Alignment Search Tool

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Ch. 9 Multiple Sequence Alignment (MSA)

Similarity searching summary (2)

Overview Multiple Sequence Alignment

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Scoring Matrices. Shifra Ben-Dor Irit Orr

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Exercise 5. Sequence Profiles & BLAST

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Optimization of a New Score Function for the Detection of Remote Homologs

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Homology Modeling. Roberto Lins EPFL - summer semester 2005

The PRALINE online server: optimising progressive multiple alignment on the web

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Protein function prediction based on sequence analysis

Do Aligned Sequences Share the Same Fold?

Detecting unfolded regions in protein sequences. Anne Poupon Génomique Structurale de la Levure IBBMC Université Paris-Sud / CNRS France

2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Pairwise sequence alignments

Sequence comparison: Score matrices

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Domain-based computational approaches to understand the molecular basis of diseases

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Building 3D models of proteins

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Introduction to Comparative Protein Modeling. Chapter 4 Part I

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

A New Similarity Measure among Protein Sequences

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Practical considerations of working with sequencing data

Pairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel )

Multiple Sequence Alignments

BIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University

Sequence Alignment Techniques and Their Uses

Biology Tutorial. Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan

Effects of Gap Open and Gap Extension Penalties

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Multiple Sequence Alignment

BLAST. Varieties of BLAST

Supporting Text 1. Comparison of GRoSS sequence alignment to HMM-HMM and GPCRDB

Single alignment: Substitution Matrix. 16 march 2017

Advanced topics in bioinformatics

Information content of sets of biological sequences revisited

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Introduction to Evolutionary Concepts

Protein Sequence Alignment and Database Scanning

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

Introduction to Bioinformatics

In-Depth Assessment of Local Sequence Alignment

Some Problems from Enzyme Families

Computational Biology From The Perspective Of A Physical Scientist

Probalign: Multiple sequence alignment using partition function posterior probabilities

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Bioinformatics. Molecular Biophysics & Biochemistry 447b3 / 747b3. Class 3, 1/19/98. Mark Gerstein. Yale University

Genomics and bioinformatics summary. Finding genes -- computer searches

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Multiple sequence alignment

Local Alignment Statistics

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences

Protein sequence alignment with family-specific amino acid similarity matrices

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Tools and Algorithms in Bioinformatics

An Introduction to Sequence Similarity ( Homology ) Searching

Scoring Matrices. Shifra Ben Dor Irit Orr

Phylogenetic inference

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter

Pairwise sequence alignment

Tools and Algorithms in Bioinformatics

Lecture 5,6 Local sequence alignment

Significant Improvement in Accuracy of Multiple Protein Sequence Alignments by Iterative Refinement as Assessed by Reference to Structural Alignments

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Biologically significant sequence alignments using Boltzmann probabilities

Introduction to protein alignments

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.

... and searches for related sequences probably make up the vast bulk of bioinformatics activities.

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

CSE 549: Computational Biology. Substitution Matrices

Tutorial 4 Substitution matrices and PSI-BLAST

Multiple structure alignment with mstali

Similarity or Identity? When are molecules similar?

Bootstrapping and Normalization for Enhanced Evaluations of Pairwise Sequence Comparison

Transcription:

Multiple Alignment using Hydrophobic Clusters : a tool to align and identify distantly related proteins J. Baussand, C. Deremble, A. Carbone Analytical Genomics Laboratoire d Immuno-Biologie Cellulaire et Moléculaire des infections parasitaires, INSERM U511 91, boulevard de l Hôpital 75013 Paris, France

Alignment and Homology Search Reliability (1) A ) Structure Good reliability B ) Sequence MFPDAHCELVHRNPFELLIAVVLSAQ MFPDAHCEL -VHRNPFELLIAVVLSAQ PGLRLPGC-----VDAFEQGVRAILGQL YHDNEWGVPETDSKKLFEMICLEGQQAG E.E. Hill and S. E. Brenner, IPAM Structural Proteomics, 2004

Alignment and Homology Detection Reliability (2) Introduction of structural information in 1) sequences 2) alignment parameters: - substitution matrices (Overington et al., 1992, Teodorescu et al., 2004) - gap penalties (Lesk et al., 1986)

Hydrophobic Clusters Hydrophobic core Folding stability Surface Protein interactions

Prediction of Hydrophobic Clusters in Sequences HCA α-helical 2D representation 1D representation N-ter C-ter Specific periodicity : +/- 1, (2,) 3, 4 Manual alignment Automatic alignment Gaboriaud et al., 1987 Baussand, Deremble and Carbone, in preparation

Hydrophobic Clusters Properties 145 protein families : 613 sequences ( < 30 % identity pairwise ) % RSS overlapping HC : 89.8% % HC overlapping RSS : 85.7% THC RSS HC THC FHC Mean length : 8.4 8.1 8.8 4.2 Mean % solvent 25.3 24.4 23.7 33.2 accessible surface : % Identity: 20.0 20.6 21.3 27.0 % Hydrophobic 61.1 72.0 72.1 72.0 position conserved :

Alignment of Sequences using Hydrophobic Clusters All residues under the same evolution pressure Evolution pressure in Structure > out of Structure Substitution Matrix Gap penalties Structure specific Substitution Matrix Gap penalties Out of structure specific Substitution Matrix Gap penalties W A F G A W A F G A P P L L W H W I in HC out of HC Thompson et al., 1995 48 substitution matrices (24 in structure, 24 out of structure)

Evaluation of the HC Fitting Matrices (1) 8 homologous couples of protein with reference alignments 24 matrices In struct., 24 matrices Out struct. GOP, GEP : 0 to 15 cgop = GOP and cgep = GEP 4 matrices (HSDM, Blosum30, Blosum62, Gonnet) GOP, GEP : 0 to 15 % Correctly Aligned Paires (% CAP) Results with optimized parameters for each couple Alignment landscape

Evaluation of the HC fitting matrices (2) CpG binding proteins (α/β, 24% identity) Landscape of the % CAP according to gap penalties : Blosum62 GEP 66 % 67 % GEP GOP GOP

Evaluation of the HC fitting matrices (2 ) HSDM Landscape of the % CAP according to gap penalties GEP α 26% α 13% α 5% β 13% β 9% β 11% α/β 24% α 16% Blosum62 GOP

Evaluation of the HC fitting matrices (3) Parameters for best average on the 8 couples : Matrix GOP-GEP Mean % CAP 2 matrices 14 2 55.4 HSDM (Prlic et al., 2000) 1 11 48.3 Blosum62 (Henikoff, 1992) 4 0 48.0 Gonnet (Gonnet et al., 1992) 0 2 46.0 Blosum30 (Henikoff, 1992) 2 0 41.9

Evaluation of the HC fitting matrices (4) Average landscape Blosum62 HSDM GEP GOP

Tests for Evaluation of the HC fitting approach 8 couples of protein with reference alignments 1 matrix out Struct. GOP, GEP : 0 to 15 1 matrix in Struct. cgop, cgep : 0 to 15 4 matrices (HSDM, Blosum30, Blosum62, Gonnet) GOP, GEP : 0 to 15 % Correctly Aligned Paires (% CAP) Alignment results with optimized parameters for each couple

Evaluation of the HC fitting gap penalties Best Results with Optimized Parameters for the 8 couples : Matrix % CAP matrices 76.3 97.4 87.8 80.0 36.3 74.0 27.5 45.2 65.8 +8.3% +8.9% = +10.8% +1.5% +1.6% = = HSDM (Prlic et al., 2000) 73.4 86.6 87.2 54.5 7.0 50.8 23.7 38.2 52.7 Blosum62 (Henikoff, 1992) 66.0 92.3 77.2 70.8 41.7 55.1 31.1 32.8 58.3 Gonnet (Gonnet et al., 1992) 61.3 90.3 73.3 70.8 8.7 46.9 31.5 39.5 52.8 Blosum30 (Henikoff, 1992) 76.1 81.2 69.0 48.5 16.1 48.0 21.0 30.1 48.7 %Sequence Id 24% 26% 16% 13% 5% 13% 9% 11% Structural class α/β α α α α β β β

Sequence Alignment Approaches Comparison Plastocyanin Azurin ( β, 13% Id ) HSDM HCA + Manual Alignment (Gaboriaud et al., 1987)

Improvement of Remote Protein Alignment HC fitting approach : Improvement of pairwise sequence alignment for distantly related proteins Multiple Alignment : Distance matrix for the phylogenetic guide tree Homology Detection Usually : Alignment score

Evaluation of Homogy and Phylogenetic Distance Cpg Binding proteins Landscape of %CAP Score Score Blosum62 HSDM Alignment Score + Evaluation of Hydrophobic Clusters superimposition SOV (Zemla et al., 1993) Detect homologous protein (< 30 % identity) Evaluate Distance among 2 sequences + - + -

Perspectives Target sequence Local Database (Trembl, Swissprot, ) Pairwise alignement Score : homologous? Multiple Alignement Set of homologous sequences + distances

Acknowlegements Alessandra Carbone Sophie Abby SOV index development and analysis Thomas Rolland Database development and Web application Lab web address :http://www.ihes.fr/~carbone/index.htm