Week 10: Homology Modelling (II) - HHpred

Similar documents
Large-Scale Genomic Surveys

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Sequence analysis and comparison

Sequence Analysis and Databases 2: Sequences and Multiple Alignments

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

EECS730: Introduction to Bioinformatics

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Multiple sequence alignment

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Similarity searching summary (2)

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES

Exercise 5. Sequence Profiles & BLAST

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Tools and Algorithms in Bioinformatics

Single alignment: Substitution Matrix. 16 march 2017

Introduction to Bioinformatics

A profile-based protein sequence alignment algorithm for a domain clustering database

Introduction to Comparative Protein Modeling. Chapter 4 Part I

CAP 5510 Lecture 3 Protein Structures

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Supporting Text 1. Comparison of GRoSS sequence alignment to HMM-HMM and GPCRDB

An Introduction to Sequence Similarity ( Homology ) Searching

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Quantifying sequence similarity

Template-Based 3D Structure Prediction

INDEXING METHODS FOR PROTEIN TERTIARY AND PREDICTED STRUCTURES

Protein Structure Prediction

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Protein Structure Prediction using String Kernels. Technical Report

CS612 - Algorithms in Bioinformatics

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Protein Structure Prediction, Engineering & Design CHEM 430

Ch. 9 Multiple Sequence Alignment (MSA)

In-Depth Assessment of Local Sequence Alignment

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Protein function prediction based on sequence analysis

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations

Tutorial 4 Substitution matrices and PSI-BLAST

Algorithms in Bioinformatics

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

HIDDEN MARKOV MODELS FOR REMOTE PROTEIN HOMOLOGY DETECTION

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

proteins Refinement by shifting secondary structure elements improves sequence alignments

Multiple Alignment using Hydrophobic Clusters : a tool to align and identify distantly related proteins

Building 3D models of proteins

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University

Overview Multiple Sequence Alignment

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster.

Sequence Alignment Techniques and Their Uses

-max_target_seqs: maximum number of targets to report

Administration. ndrew Torda April /04/2008 [ 1 ]

Structure to Function. Molecular Bioinformatics, X3, 2006

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Computational Molecular Biology (

Similarity or Identity? When are molecules similar?

The Pennsylvania State University. The Graduate School. College of Engineering A COMPUTATIONAL FRAMEWORK FOR INFERRING STRUCTURE, FUNCTION,

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University

SUPPLEMENTARY INFORMATION

CSCE555 Bioinformatics. Protein Function Annotation

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Identification of correct regions in protein models using structural, alignment, and consensus information

Computational Biology

Fundamentals of database searching

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Introduction to protein alignments

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.

Bioinformatics: Secondary Structure Prediction

Moreover, the circular logic

Hidden Markov Models in computational biology. Ron Elber Computer Science Cornell

CS612 - Algorithms in Bioinformatics

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Modeling for 3D structure prediction

G4120: Introduction to Computational Biology

Scoring Matrices. Shifra Ben-Dor Irit Orr

Sequence Analysis '17- lecture 8. Multiple sequence alignment

Hidden Markov Models (HMMs) and Profiles

Mining and classification of repeat protein structures

PDBe TUTORIAL. PDBePISA (Protein Interfaces, Surfaces and Assemblies)

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Transcription:

Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1

2

Identify and align related structures by sequence methods is not an easy task All comparative modeling programs strongly depend on a target-template alignment The template search methods usually produce a non-optimal alignment, especially in the more difficult cases (< 30% ID) 3

Alignment methods depend on identity level > 30% sequence identity Automatic methods for sequence-sequence alignment are usually accurate enough < 30% sequence identity Manual alignment curation required Use structural information (e.g. avoid gaps in secondary structure elements) Misalignments are critical: each mistake in buried regions is estimated to cause a ~4Å deviation in the model!! Therefore for this level of identity, more accurate methods are required 4

5

HHpred Remote Homology detection & structure prediction HHpred is a method for protein remote homology detection and 3D structure prediction based on the pairwise comparison of profile hidden Markov models (HMM-HMM alignment). HHpred is as easy to use as BLAST or PSI-BLAST but at the same time is much more sensitive in finding remote homology. HHpred accepts a single query sequence or a multiple alignment as input and it returns possible templates, E-value, etc. HHpred can can also produce 3D-structural models calculated by the MODELLER software. 6

Profile based methods are more accurate When searching for remote homologs, it is wise to make use of as much information about the query and database proteins as possible in order to better distinguish true from false templates. For that purpose they use PROFILES Profiles contain information about the importance of each position for finding other members of the protein family. Profile methods are generally superior to sequence based methods. 7

What is a PSSM? A PSSM, or Position-Specific Scoring Matrix, is a type of scoring matrix used in protein PSI-BLAST searches. PSSM amino acid substitution scores are position dependent in a protein multiple sequence alignment. Thus, a Tyr-Arg substitution may score different for different alignment positions. This is in contrast to position-independent matrices such as the PAM and BLOSUM matrices, in which the Tyr-Trp substitution receives the same score no matter at what position it occurs. 8

Position-Specific Scoring Matrix (PSSM) NCBI Viewer 9

profile-profile comparison (alignment) method 1. Calculate template & target profiles by constructing alignments them with sequences from a NR database 2.Align the target and the template profiles Align to Profile 1 Profile 2 Profiles contain more evolutionary information about the family Profile-profile alignment methods are able to provide better alignments with distant homologues 10

Alignment is done using HMM Hidden Markov Model A Hidden Markov model (HMM) is a statistical method. For a specific alignment, we can calculate an HMM model and extract information about a match (M), insertion (I) and a deletion states (D) for each column. The extracted model parameters can then be used to align a new query sequence. 11

HHpred method Target sequence or alignment N x PSI- BLAST Alignment based on PSI-BLAST sequences Collection of target homologs on NR Profile HMM for target alignment Generate positionspecific gap penalties Probabilities M, I and D for 20 aa HHsearch HMM-HMM alignment between target and template (PDB, PFAM) Alignment and secondary structure prediction Easy-to-read results, including E-values and true probabilities, secondary structure, consensus sequences and position-specific reliability 12

Criteria to choose within multiple PDB available templates 1. Higher sequence identity 2.Close subfamily 3. Environment similarity (solvent, ph, ligand, quaternary interactions) 4.The quality of the experimentally determined structure (RMSD) 5.Purpose of modeling (e.g. protein-ligand model) 13

How can I verify if a database distant match is really homologous? 1. Check probability and E-value 2. Check if homology is biologically suggestive or at least reasonable 3. Check secondary structure similarity 4. Check relationship among top hits 5. Check for possible conserved motifs (and their residues) 6. Check query and template alignments! 7. Use HHsenser to find more homologs for your query alignment 8. Verify predictions experimentally 9. Try out other structure prediction servers! 14

Alignment.ali (in PIR format) Indicates that this is the template structure PIR Alignment used in Modeller Must match the PDB file name Residues that take part in the alignment (pdb indexing!) Indicates that this is the target sequence http://expasy.org/tools/ Target name Indicates end of alignment Residues that take part in the alignment Indicates end of alignment 15