Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Similar documents
Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Week 10: Homology Modelling (II) - HHpred

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Sequence analysis and comparison

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Modeling for 3D structure prediction

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Pairwise & Multiple sequence alignments

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Protein Modeling Methods. Knowledge. Protein Modeling Methods. Fold Recognition. Knowledge-based methods. Introduction to Bioinformatics

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy

Homology Modeling I. Growth of the Protein Data Bank PDB. Basel, September 30, EMBnet course: Introduction to Protein Structure Bioinformatics

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Computational Molecular Biology. Protein Structure and Homology Modeling

Protein Structure Prediction

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Homology modeling of Ferredoxin-nitrite reductase from Arabidopsis thaliana

Quantifying sequence similarity

Large-Scale Genomic Surveys

Building 3D models of proteins

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Protein Structure Prediction, Engineering & Design CHEM 430

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

CS612 - Algorithms in Bioinformatics

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Basic Local Alignment Search Tool

Introduction to Bioinformatics

Introduction to protein alignments

Algorithms in Bioinformatics

Protein structure analysis. Risto Laakso 10th January 2005

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Practical considerations of working with sequencing data

RNA and Protein Structure Prediction

Orientational degeneracy in the presence of one alignment tensor.

Bioinformatics. Macromolecular structure

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

CAP 5510 Lecture 3 Protein Structures

Substitution matrices

ALL LECTURES IN SB Introduction

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Protein Structures. 11/19/2002 Lecture 24 1

Protein Structure Determination

Scoring Matrices. Shifra Ben-Dor Irit Orr

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Steps in protein modelling. Structure prediction, fold recognition and homology modelling. Basic principles of protein structure

Introduction to" Protein Structure

Protein Structures: Experiments and Modeling. Patrice Koehl

Template Free Protein Structure Modeling Jianlin Cheng, PhD

CS612 - Algorithms in Bioinformatics

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

Prediction and refinement of NMR structures from sparse experimental data

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Sequence Database Search Techniques I: Blast and PatternHunter tools

Moreover, the circular logic

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Pairwise sequence alignment

Supporting Online Material for

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

CSCE555 Bioinformatics. Protein Function Annotation

Exploring Evolution & Bioinformatics

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Protein Structure Prediction

Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University

Tools and Algorithms in Bioinformatics

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Local Alignment Statistics

Genomics and bioinformatics summary. Finding genes -- computer searches

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

INDEXING METHODS FOR PROTEIN TERTIARY AND PREDICTED STRUCTURES

Computational Biology: Basics & Interesting Problems

Sequences, Structures, and Gene Regulatory Networks

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Analysis and Prediction of Protein Structure (I)

Template-Based Modeling of Protein Structure

Computational Molecular Biology (

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Sequence comparison: Score matrices

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

In-Depth Assessment of Local Sequence Alignment

Transcription:

Homology Modeling (Comparative Structure Modeling)

Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded in the genome Up to 40% of the known protein sequences have at least one segment related to one or more structures => Determine all of the folds => Use homology modeling to predict 3D structures

Growth in the PDB

What is Homology? Homology: having a common evolutionary origin Cannot be partial Assertion of homology is an hypothesis Hypothesis usually based on extent of sequence similarity between proteins, though similar functions should be demonstrated

Some Definitions Homologues (homologs): proteins that are evolutionarily related Orthologues (orthologs): homologues from different organisms Paralogues (paralogs): homologues from the same organism

Basis of Homology Modeling 3D structures conserved to greater extent than primary structures Develop models of protein structure based on structures of homologues Using known structure as a template, calculate 3D model of a protein for which only know the sequence (the target )

Steps in Homology Modeling

Template Selection Identify protein structures related to target and select those to be used as templates Involves searching a database such as at NCBI (e.g., BLAST at NCBI) Involves a certain amount of sequence alignment

Aligning Sequences Critical step in homology modeling Many options to consider Factors to consider Which algorithm to use Which scoring method to apply Whether and how to assign gap penalties

Scoring Alignments Need some method of scoring to find optimal alignment Four general types of scoring have been applied Identity: considers only identical residues Genetic code: considers the number of base changes in DNA or RNA to interconvert codons for the amino acids Chemical similarity: considers physico-chemical properties Observed substitutions: considers substitution frequencies observed in alignments of sequences (*used the most*)

Scoring Matrices PAM40 - short highly similar sequences PAM160 - detecting members of protein family PAM250 - longer more divergent sequences BLOSUM90 - short highly similar sequences BLOSUM80 - detecting members of protein family BLOSUM62 - most effective in finding all potential similarities BLOSUM30 - longer more divergent sequences

Log-Odds Matrix S i,j = log[q i,j )/(p i p j )] q i,j = frequency of substitution p i p j = probability of occurrence of residues i and j in proteins

Rigid body assembly Building the 3D Model Rigid bodies from aligned sequences Core region, loops, and side chains Satisfaction of spatial restraints Generate restraints from templates Assume distances and angles between aligned template and target are similar Minimize violations of all restraints using distance geometry or optimization techniques (i.e., force field) to satisfy spatial restraints

Evaluation of Model Quality Check for proper protein stereochemistry ProCheck (http://biotech.ebi.ac.uk:8400/cgi-bin/sendquery) Ramachandran plot, bond-length, Whatif (http://www.cmbi.kun.nl/gv/servers/wiwwwi) Packing quality Both web-servers Fitness of sequence to structure ProsaII (http://lore.came.sbg.ac.at/services/prosa.html) Program runs on Linux and Unix Verify3D (http://www.doe-mbi.ucla.edu/services/verify_3d/) Web-server

Evaluating the 3D Model Ramachandran plot Planar peptide bonds Side chain conformations that correspond to those in rotamer library Hydrogen bonding No bad atom-atom contacts Procheck

Evaluating the 3D Model 3D-Profiler (Verify 3D) Based on statistical preferences of each of the 20 amino acids for particular environments within a protein Residue positions characterized by environment Preferred environments defined by three parameters Area of each residue that is buried Fraction of side-chain area that is covered by polar atoms (i.e., O and N) Local secondary structure

Refining the 3D Model MD and energy minimization Application of restraints based on experimental data (e.g., NMR, fluorescence)

Applications of the Model