Homologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are

Similar documents
Motivating the need for optimal sequence alignments...

Orientational degeneracy in the presence of one alignment tensor.

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Structure to Function. Molecular Bioinformatics, X3, 2006

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Week 10: Homology Modelling (II) - HHpred

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Analysis and Prediction of Protein Structure (I)

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Structural Alignment of Proteins

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

FoldMiner: Structural motif discovery using an improved superposition algorithm

Genomics and bioinformatics summary. Finding genes -- computer searches

Molecular Modeling lecture 4. Rotation Least-squares Superposition Structure-based alignment algorithms

Prediction and refinement of NMR structures from sparse experimental data

Molecular Modeling lecture 17, Tue, Mar. 19. Rotation Least-squares Superposition Structure-based alignment algorithms

Protein Structure Prediction

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Bio nformatics. Lecture 23. Saad Mneimneh

Introduction to" Protein Structure

Sequence analysis and comparison

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

... and searches for related sequences probably make up the vast bulk of bioinformatics activities.

Bioinformatics. Macromolecular structure

SUPPLEMENTARY FIGURES. Structure of the cholera toxin secretion channel in its. closed state

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Design of topological proteins based on concatenated coiled-coil modules Roman Jerala

of the Guanine Nucleotide Exchange Factor FARP2

SUPPLEMENTARY INFORMATION

DISCRETE TUTORIAL. Agustí Emperador. Institute for Research in Biomedicine, Barcelona APPLICATION OF DISCRETE TO FLEXIBLE PROTEIN-PROTEIN DOCKING:

Methods of Protein Structure Comparison. Irina Kufareva and Ruben Abagyan

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Protein Structure Overlap

Supporting Online Material for

Homology models of the tetramerization domain of six eukaryotic voltage-gated potassium channels Kv1.1-Kv1.6

Multiple structure alignment with mstali

SUPPLEMENTARY INFORMATION

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Supplementary Figure 1. Aligned sequences of yeast IDH1 (top) and IDH2 (bottom) with isocitrate

Supersecondary Structures (structural motifs)

Basic structures of proteins

Nature Structural and Molecular Biology: doi: /nsmb.2938

ITERATIVE NON-SEQUENTIAL PROTEIN STRUCTURAL ALIGNMENT

Chapter 9 DNA recognition by eukaryotic transcription factors

Bioinformatics: Secondary Structure Prediction

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Overview & Applications. T. Lezon Hands-on Workshop in Computational Biophysics Pittsburgh Supercomputing Center 04 June, 2015

EBI web resources II: Ensembl and InterPro

Phylogenetic analysis of Cytochrome P450 Structures. Gowri Shankar, University of Sydney, Australia.

Table 1. Crystallographic data collection, phasing and refinement statistics. Native Hg soaked Mn soaked 1 Mn soaked 2

Pymol Practial Guide

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

RNA Polymerase I Contains a TFIIF-Related DNA-Binding Subcomplex

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

CAP 5510 Lecture 3 Protein Structures

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Structural characterization of NiV N 0 P in solution and in crystal.

Rex-Family Repressor/NADH Complex

Supplementary Figure 3 a. Structural comparison between the two determined structures for the IL 23:MA12 complex. The overall RMSD between the two

Heteropolymer. Mostly in regular secondary structure

Graph Alignment and Biological Networks

Ranjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India. 1 st November, 2013

Gene regulation I Biochemistry 302. Bob Kelm February 25, 2005

A new algorithm for sequential and non-sequential protein multiple structure alignment

FlexPepDock In a nutshell

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Protein Structure Determination

Getting To Know Your Protein

Review. Membrane proteins. Membrane transport

18.4 Embryonic development involves cell division, cell differentiation, and morphogenesis

Lecture 7 Sequence analysis. Hidden Markov Models

Efficient Protein Tertiary Structure Retrievals and Classifications Using Content Based Comparison Algorithms

Goals. Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions

Protein Structure Prediction, Engineering & Design CHEM 430

The role of form in molecular biology. Conceptual parallelism with developmental and evolutionary biology

Position-specific scoring matrices (PSSM)

Identification of Local Conformational Similarity in Structurally Variable Regions of Homologous Proteins Using Protein Blocks

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Ch. 9 Multiple Sequence Alignment (MSA)

Overview Multiple Sequence Alignment

Basics on bioinforma-cs Lecture 7. Nunzio D Agostino

Alignment & BLAST. By: Hadi Mozafari KUMS

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

proteins SHORT COMMUNICATION MALIDUP: A database of manually constructed structure alignments for duplicated domain pairs

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Villa et al. (2005) Structural dynamics of the lac repressor-dna complex revealed by a multiscale simulation. PNAS 102:

Modeling for 3D structure prediction

Protein structure alignments

Algorithms in Bioinformatics

PROTEIN-PROTEIN DOCKING REFINEMENT USING RESTRAINT MOLECULAR DYNAMICS SIMULATIONS

RNA Protein Interaction

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

Transcription:

1

Homologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are as close to each other as possible. Structural similarity is very apparent in these two proteins, the Green Fluorescent Protein of Aequorea victoria (1EMA) and the Red Fluorescent protein of Discosoma striata. 2

After superposition, the structure of these two proteins virtually overlap. Sequence similarity is recognizable over the whole length of the domains (top left), although slightly less than 25% identity. Also, the sequence alignment corresponds closely to the alignment derived from spatially close matching residues, computed by Chimera (bottom left). 3

With more distant homologues, superposition may be more challenging. In this example, we compare the APSES domain of yeast Mbp1 (1BM8) and the ETS domain of the human Elk-1 transcription fator (1DUX). Both domains are members of the winged helix superfamily of DNA binding modules. But there are significant topological rearrangements which make it challenging to match corresponding residues. 4

Indeed, sequence alignment fails completely to discover any reasonable region of similarity. However structural superposition matches the helix and wing motif quite well, and the superposition-derived alignment (bottom left) shows significant sequence conservation. Superposition is valuable for the analysis of distant family relationships and conservation patterns, but it has other important uses too, for the analysis of interaction sites. 5

For example, after superposition of 1BM8 with 1DUX, which is a structure of a protein DNA complex, we can study the detailed interactions of the helix with the DNA major groove, and the apex of the wing with the DNA minor groove, and evaluate whether these interactions may be conserved. This analysis may allow conclusions about the DNA binding mode of 1BM8, for which no structure of a protein-dna complex has been determined so far. 6

Optimal superposition aims to minimize the RMSD between two sets of matching atoms. RMSD or root mean square deviation is simply the square root of the average sum of squared coordinate distances. However, this is just a measure of the relationship between two sets of points in space - it depends on the distance between the point sets, their rotation and the quantitatiy we are interested in: their intrinsic structural similarity. See also: Structural Alignment (Wikipedia) (http://en.wikipedia.org/wiki/protein_structural_alignment) 7

A meaningful comparison of structural segments requires that the coordinate sets at first be optimally superimposed : this means find a translation and rotation that minimizes the residual RMSD. Note that only this analytic part is a solved problem. The choice of coordinate pairs to superimpose is difficult. Just as with sequence alignment, this choice is only straigthforward if the number of coordinates (residues) in both proteins is the same. But if there are indels, that number changes, and disordered sections of loops or termini should not be included in the superposition anyway. Moreover, RMSD values are lowest for a small, structurally conserved set of residues which may not be representative of global structural distortions. Thus the major computational challenge is to find which pairs of atoms should be matched between two structures. This problem has no clear algorithmic solution, and successful algorithms apply heuristics that may include gloabal and local similarities, coarse grained approximations of secondary structure elements, and iterated improvements. 8

Relative domain motion (and sub-domain motion to a degree) can often be approximated as independent rigid bodies, joined by a flexible hinge. Global superposition may not give satisfactory results. Local superposition requires to define the domain boundaries and does not preserve interface geometries. Superposition applies the same rotation and translation to all atoms, a smoothly varying deformation may be more appropriate to model real molecular relationships. The Godzik lab s FATCAT server addresses this problem, the algorithm is available for structure comparisons at the PDB. cf. Yuzhen Ye, Y. and Godzik, A. (2004) FATCAT: a web server for flexible structure comparison and structure similarity searching. Nucleic Acids Res. 32: W582 W585. (PMID 15215455) 9

VAST (Vector Alignment Search Tool) finds similar structures by searching for similarly orianted and arranged elements of secondary structure. 10

The list of matches returned by DALI for a search with 1BM8 has a number of very interesting hits, more, and more relevant than the hits VAST had discovered. 2XFV the N-terminal domain of yeast Swi6 does not bind DNA, and the structural superposition rationalizes this well. The two sequences have only about 10% pairwise identity after alignment: homology can not be inferred from sequence similarity in this case. 11

12