Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Similar documents
Introduction to Evolutionary Concepts

Evolution of Protein Structure in the Aminoacyl-tRNA Synthetases

Part III - Bioinformatics Study of Aminoacyl trna Synthetases. VMD Multiseq Tutorial Web tools. Perth, Australia 2004 Computational Biology Workshop

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Week 10: Homology Modelling (II) - HHpred

Characterizing molecular systems

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Large-Scale Genomic Surveys

Introduction to Evolutionary Concepts

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Bioinformatics. Dept. of Computational Biology & Bioinformatics

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Biologically significant sequence alignments using Boltzmann probabilities

Sequence Analysis and Databases 2: Sequences and Multiple Alignments

Evolution of Translation: Dynamics of Recognition in RNA:Protein Complexes

Homology Modeling. Roberto Lins EPFL - summer semester 2005

A profile-based protein sequence alignment algorithm for a domain clustering database

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Introduction to Bioinformatics

Exercise 5. Sequence Profiles & BLAST

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Sequence Alignment Techniques and Their Uses

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Biology Tutorial. Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan

Introduction to Bioinformatics Online Course: IBT

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

frmsdalign: Protein Sequence Alignment Using Predicted Local Structure Information for Pairs with Low Sequence Identity

PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Supporting Information for Evolutionary profiles from the QR factorization of multiple sequence alignments

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Computational Genomics and Molecular Biology, Fall

Computational Molecular Biology (

Similarity searching summary (2)

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Protein Structure Prediction

In-Depth Assessment of Local Sequence Alignment

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.

EECS730: Introduction to Bioinformatics

Collected Works of Charles Dickens

Multiple sequence alignment

Pairwise & Multiple sequence alignments

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Pairwise sequence alignment

Moreover, the circular logic

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

Building 3D models of proteins

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Building a Homology Model of the Transmembrane Domain of the Human Glycine α-1 Receptor

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Sequence comparison: Score matrices

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

CS612 - Algorithms in Bioinformatics

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Annotation Error in Public Databases ALEXANDRA SCHNOES UNIVERSITY OF CALIFORNIA, SAN FRANCISCO OCTOBER 25, 2010

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Evolution of Protein Structure

IMPORTANCE OF SECONDARY STRUCTURE ELEMENTS FOR PREDICTION OF GO ANNOTATIONS

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

ALL LECTURES IN SB Introduction

Protein Structure Prediction 11/11/05

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences

Basic Local Alignment Search Tool

Bioinformatics for Biologists

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

An Introduction to Sequence Similarity ( Homology ) Searching

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structures. 11/19/2002 Lecture 24 1

Tools and Algorithms in Bioinformatics

Template-Based 3D Structure Prediction

Supporting Online Material for

Ch. 9 Multiple Sequence Alignment (MSA)

CAP 5510 Lecture 3 Protein Structures

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Heuristic Alignment and Searching

Efficient Remote Homology Detection with Secondary Structure

Development and Large Scale Benchmark Testing of the PROSPECTOR_3 Threading Algorithm

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Supporting Text 1. Comparison of GRoSS sequence alignment to HMM-HMM and GPCRDB

Prediction of protein function from sequence analysis

Modeling for 3D structure prediction

MSc in Bioinformatics for Health Sciences SBI. Structural Bioinformatics

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Protein Structure Prediction using String Kernels. Technical Report

Pairwise sequence alignments

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Transcription:

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Why Look at More Than One Sequence? 1. Multiple Sequence Alignment shows patterns of conservation 2. What and how many sequences should be included? 3. Where do I find the sequences and structures for MS alignment? 4. How to generate pairwise and multiple sequence alignments?

Sequence-Sequence Alignment Smith-Watermann Needleman-Wunsch Sequence-Structure Alignment Threading Hidden Markov, Clustal Structure-Structure Alignment STAMP - Barton and Russell CE - Bourne et al. Seq. 1: a 1 a 2 a 3 - - a 4 a 5 a n Seq. 2: c 1 - c 2 c 3 c 4 c 5 - c m Sequence Database Searches Blast and Psi-Blast

Sequence-Sequence Alignment Smith-Watermann Needleman-Wunsch Sequence-Structure Alignment Threading Hidden Markov, Clustal Structure-Structure Alignment STAMP - Barton and Russell CE - Bourne et al. Profile 1: A 1 A 2 A 3 - - A 4 A 5 A n Profile 2: C 1 - C 2 C 3 C 4 C 5 - C m Sequence Database Searches SCOP, Astral PDB Blast and Psi-Blast NCBI Swiss Prot

cut

paste Choice of substitution matrix and gap penalty

Final Result: Sequence Alignment - Approximate

Tutorial for the material of this lecture available

Sequence Alignment & Dynamic Programming substitution index i number of possible alignments Seq. 1: a 1 a 2 a 3 - - a 4 a 5 a n Seq. 2: c 1 - c 2 c 3 c 4 c 5 - c m gap index j Smith-Waterman alignment algorithm objective function gap penalty substitution matrix traceback defined through choice of maximum

Blosum 30 Substitution Matrix AA not resolved

Amino Acid Three Letter and One Letter Code

Sequence Alignment & Dynamic Programming Seq. 1: a 1 a 2 a 3 - - a 4 a 5 a n Seq. 2: c 1 - c 2 c 3 c 4 c 5 - c m number of possible alignments: Needleman-Wunsch alignment algorithm S : substitution matrix A R N D C Q E G H I L K M F P S T W Y V B Z X 5-2 -1-1 - 2 0-1 1-2 -1-2 - 1-1 -3-2 1 0-3 - 2 0-1 -1 0 A -2 9 0-1 - 3 2-1 -3 0-3 -2 3-1 -2-3 -1-2 -2-1 -2-1 0-1 R -1 0 8 2-2 1-1 0 1-2 -3 0-2 -3-2 1 0-4 - 2-3 4 0-1 N -1-1 2 9-2 -1 2-2 0-4 -3 0-3 -4-2 0-1 -5-3 -3 6 1-1 D -2-3 -2-2 16-4 -2-3 -4-4 -2-3 -3-2 -5-1 -1-6 - 4-2 -2-3 -2 C 0 2 1-1 - 4 8 2-2 0-3 -2 1-1 -4-2 1-1 -1-1 -3 0 4-1 Q -1-1 -1 2-2 2 7-3 0-4 -2 1-2 -3 0 0-1 -2-2 -3 1 5-1 E 1-3 0-2 - 3-2 -3 8-2 -4-4 - 2-2 -3-1 0-2 -2-3 -4-1 -2-1 G -2 0 1 0-4 0 0-2 13-3 -2-1 1-2 -2-1 -2-5 2-4 0 0-1 H -1-3 -2-4 - 4-3 -4-4 -3 6 2-3 1 1-2 -2-1 -3 0 4-3 -4-1 I -2-2 -3-3 - 2-2 -2-4 -2 2 6-2 3 2-4 -3-1 -1 0 2-3 -2-1 L -1 3 0 0-3 1 1-2 -1-3 -2 6-1 -3-1 0 0-2 - 1-2 0 1-1 K -1-1 -2-3 - 3-1 -2-2 1 1 3-1 7 0-2 -2-1 -2 1 1-3 -2 0 M -3-2 -3-4 - 2-4 -3-3 -2 1 2-3 0 9-4 -2-1 1 4 0-3 -4-1 F -2-3 -2-2 - 5-2 0-1 -2-2 -4-1 -2-4 11-1 0-4 - 3-3 -2-1 -2 P 1-1 1 0-1 1 0 0-1 -2-3 0-2 -2-1 5 2-5 - 2-1 0 0 0 S 0-2 0-1 - 1-1 -1-2 -2-1 -1 0-1 -1 0 2 6-4 - 1 1 0-1 0 T -3-2 -4-5 - 6-1 -2-2 -5-3 -1-2 -2 1-4 -5-4 19 3-3 -4-2 -2 W -2-1 -2-3 - 4-1 -2-3 2 0 0-1 1 4-3 -2-1 3 9-1 -3-2 -1 Y 0-2 -3-3 - 2-3 -3-4 -4 4 2-2 1 0-3 -1 1-3 - 1 5-3 -3-1 V -1-1 4 6-2 0 1-1 0-3 -3 0-3 -3-2 0 0-4 - 3-3 5 2-1 B -1 0 0 1-3 4 5-2 0-4 -2 1-2 -4-1 0-1 -2-2 -3 2 5-1 Z 0-1 -1-1 - 2-1 -1-1 -1-1 -1-1 0-1 -2 0 0-2 - 1-1 -1-1 -1 X Score Matrix H: Traceback gap penalty W = - 6

Needleman-Wunsch Global Alignment Similarity Values Initialization of Gap Penalties http://www.dkfz-heidelberg.de/tbi/bioinfo/practicalsection/aliapplet/index.html

Filling out the Score Matrix H http://www.dkfz-heidelberg.de/tbi/bioinfo/practicalsection/aliapplet/index.html

Traceback and Alignment The Alignment Traceback (blue) from optimal score http://www.dkfz-heidelberg.de/tbi/bioinfo/practicalsection/aliapplet/index.html

Protein Structure Prediction 1-D protein sequence SISSIRVKSKRIQLG Homology Modeling/ FR E = E + E match gap Target Sequence SISSRVKSKRIQLGLNQAELAQKV------GTTQ QFANEFKVRRIKLGYTQ----TNVGEALAAVHGS Known structure(s) 3-D protein structure

Target sequence Sequence-Structure Alignment Alignment between target(s) and scaffold(s) Scaffold structure A 1 A 2 A 3 A 4 A 5 1. Energy Based Threading* H + E = Econtact + E profile + EH bonds n ( p) E profile = γ i i, i E contact = i, j 2 k = 1 ( A, SS SA ) γ ( ct) k i gap ( A, A ) U( r r ) i j 2. Sequence Structure Profile Alignments k ij Clustal, Hidden Markov (HMMER, PSSM) with position dependent gap penalties *R. Goldstein, Z. Luthey-Schulten, P. Wolynes (1992, PNAS), K. Koretke et.al. (1996, Proteins)

CM/Fold Recognition Results from CASP5 Lessons Learned The prediction is never better than the scaffold. Threading Energy Function and Profiles need improvement.

Structural Profiles 1.Structure more conserved than sequences!!! Similar structures at the Family and Superfamily levels. Add more structural information 2.Which structures and sequences to include? Use evolution and eliminate redundancy with QR factorization

Structural Domains

Profile - Multiple Structural Alignments Representative Profile of AARS Family Catalytic Domain

STAMP - Multiple Structural Alignments 1. Initial Alignment Inputs Multiple Sequence alignment Ridged Body Scan 2. Refine Initial Alignment & Produce Multiple Structural Alignment Dynamic Programming (Smith-Waterman) through P matrix gives optimal set of equivalent residues. This set is used to re-superpose the two chains. Then iterate until alignment score is unchanged. This procedure is performed for all pairs. R. Russell, G. Barton (1992) Proteins 14: 309.

Multiple Structural Alignments STAMP cont d 2. Refine Initial Alignment & Produce Multiple Structural Alignment Alignment score: Multiple Alignment: Create a dendrogram using the alignment score. Successively align groups of proteins (from branch tips to root). When 2 or more sequences are in a group, then average coordinates are used.

Initial Pairwise Superposition - Single Linkage Cluster N = 4 proteins, N(N-1)/2 pairs Table of RMS Dendogram: Initial step starts by fitting B to D (0.5 A), then A to C (1.0A), then fitting AC to BD using average coordinates for AC and BD pairs

Variation in Secondary Structure STAMP Output Eargle, O Donoghue, Luthey-Schulten, UIUC 2004 R. Russell, G. Barton (1992) Proteins 14: 309

Stamp Output/Clustal Format From multiple structure alignment compute position probabilities for amino acids and gaps!!!!

HMM / Clustal Models of Transmembrane Proteins N neutralized loop 1 2 3 4 5 6 7 Metalated Active State: Before Odorant Binding loop penetration N O 1 2 3 4 5 6 7 C C Bacteriorhodpsin/Rhodopsins Olfactory Receptor/Bovine Rhodopsin J. Wang, Z. Luthey-Schulten, K. Suslick (2003) PNAS 100(6):3035-9

Stamp Profile - 3 Structures

Clustal Profile-Profile Alignment Profile 1 Structures Profile 2 Sequence

PSSM-based approach I. Construction of Profile II Database search 1 2 3 4 5 Sequence 1 - B B - C Sequence 2 C C - - C Sequence 3 C B C C B Multiple Sequence Alignment Position Specific Amino Acid Probabilities j 1 2 3 4 5 Align every sequence in the database to the profile using Dynamic Programming algorithm. Sequence represented by S(S 1,S 2,,S nres ) Progressive alignment score = H(i,j) P(C j ) 1 0.33 0.5 1 0.67 P(B j ) 0 0.67 0.5 0 0.33 Position specific score for aligning i th residue of S to j th position of profile for j=1,2,...,m aln and i=1,2,...,n res Traceback gives the optimal alignment of the sequence S to profile. Altschul, et. al., Nucleic Acids Research, 25:3389-3402. Also useful: http://apps.bioneq.qc.ca/twiki/bin/view/knowledgebase/pssm

Refine Structure Prediction with Modeller 6.2 Sethi and Luthey-Schulten, UIUC 2003 Modeller 6.2 A. Sali, et al.

Web Resources for Computational Biology Sequences and Structure Databases: Proteins and Nucleic Acids SwissProt (www.expasy.org/sprot/) NCBI (www.ncbi.nlm.nih.gov/) CRW (www.rna.icmb.utexas.edu/) PDB (www.rcsb.org/pdb) SCOP/Astral (scop.mrc-lmb.cam.ac.uk/scop/) Genomic Context DOE IMG (http://img.jgi.doe.gov/) NMPDR (www.nmprdr.org) Biological Ontologies: Pathways and Functions GO, COG, KEGG (http://www.genome.jp/) 3D Protein Structure Prediction Servers: (See CASP6/CASP7) Robetta (http://robetta.bakerlab.org) Tasser (http://cssb.biology.gatech.edu/skolnick/webservice/tasserlite/index.html SwissModel (http://swissmodel.expasy.org/swiss-model.html

Programs for Computational Biology - continued - Multiple Sequence Alignment Programs; CLUSTAl, TCOFFEE,.. Database Search Programs: BLAST, PSIBLAST, HMMER RNA 2nd Structure Prediction: MFOLD, Vienna, Multiple Sequence/Structure Alignment, DB Search, Analysis,. MultiSeq in VMD (NIH Resource UIUC) Discovery Studio ($Accelyrs) Chimera (NIH Resource UCSF),. Molecular Dynamics Simulations for Proteins, Lipids, RNA NAMD2 (NIH Resource UIUC) CHARMM ($ Accelyrs/Harvard) AMBER ($Accelyrs/Scripps) SYBIL, MOE ($),.

Non-redundant Representative Sets Too much information 129 Structures Multidimensional QR factorization of alignment matrix, A. Economy of information 16 representatives QR computes a set of maximal linearly independent structures. P. O Donoghue and Z. Luthey-Schulten (2003) MMBR 67:550-571. P. O Donoghue and Z. Luthey-Schulten (2005) J. Mol. Biol., 346, 875-894.

On to Evolution of Protein Structure