Protein Structure Prediction and Display

Similar documents
Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

Basics of protein structure

Bioinformatics: Secondary Structure Prediction

Protein Structures: Experiments and Modeling. Patrice Koehl

Physiochemical Properties of Residues

Protein Secondary Structure Prediction

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

Protein Secondary Structure Prediction

Steps in protein modelling. Structure prediction, fold recognition and homology modelling. Basic principles of protein structure

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter

Getting To Know Your Protein

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Protein Structure Prediction

Bioinformatics: Secondary Structure Prediction

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Protein Secondary Structure Assignment and Prediction

Section II Understanding the Protein Data Bank

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy

Orientational degeneracy in the presence of one alignment tensor.

HIV protease inhibitor. Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.

Protein Secondary Structure Prediction using Feed-Forward Neural Network

Improved Protein Secondary Structure Prediction

BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Protein Structure Prediction I

CAP 5510 Lecture 3 Protein Structures

Protein Data Bank Contents Guide: Atomic Coordinate Entry Format Description. Version Document Published by the wwpdb

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Secondary Structure Prediction using Pattern Recognition Neural Network

Lecture 7. Protein Secondary Structure Prediction. Secondary Structure DSSP. Master Course DNA/Protein Structurefunction.

Packing of Secondary Structures

Optimization of the Sliding Window Size for Protein Structure Prediction

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Pymol Practial Guide

ALL LECTURES IN SB Introduction

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

CS612 - Algorithms in Bioinformatics

Molecular Modeling lecture 2

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

BCB 444/544 Fall 07 Dobbs 1

IT og Sundhed 2010/11

3D Structure. Prediction & Assessment Pt. 2. David Wishart 3-41 Athabasca Hall

Viewing and Analyzing Proteins, Ligands and their Complexes 2

PROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES

Structure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase Cwc27

Visualization of Macromolecular Structures

Protein Secondary Structure Prediction

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

Analysis and Prediction of Protein Structure (I)

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Computational Structural Biology and Molecular Simulation. Introduction to VMD Molecular Visualization and Analysis

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Heteropolymer. Mostly in regular secondary structure

The Select Command and Boolean Operators

SUPPLEMENTARY MATERIALS

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Protein structure alignments

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Week 10: Homology Modelling (II) - HHpred

Protein Bioinformatics Computer lab #1 Friday, April 11, 2008 Sean Prigge and Ingo Ruczinski

BCH 4053 Spring 2003 Chapter 6 Lecture Notes

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

Ligand Scout Tutorials

Sequential resonance assignments in (small) proteins: homonuclear method 2º structure determination

Comparing whole genomes

The Structure and Functions of Proteins

Protein Structure Basics

Chapter 2 Structures. 2.1 Introduction Storing Protein Structures The PDB File Format

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

D Dobbs ISU - BCB 444/544X 1

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

1. (5) Draw a diagram of an isomeric molecule to demonstrate a structural, geometric, and an enantiomer organization.

NMR, X-ray Diffraction, Protein Structure, and RasMol

What makes a good graphene-binding peptide? Adsorption of amino acids and peptides at aqueous graphene interfaces: Electronic Supplementary

Part 4 The Select Command and Boolean Operators

Lecture 14 Secondary Structure Prediction

Central Dogma. modifications genome transcriptome proteome

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Assignment 2 Atomic-Level Molecular Modeling

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Conformational Geometry of Peptides and Proteins:

DATE A DAtabase of TIM Barrel Enzymes

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

SINGLE-SEQUENCE PROTEIN SECONDARY STRUCTURE PREDICTION BY NEAREST-NEIGHBOR CLASSIFICATION OF PROTEIN WORDS

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Protein Structure: Data Bases and Classification Ingo Ruczinski

Prediction of protein secondary structure by mining structural fragment database

THE UNIVERSITY OF MANITOBA. PAPER NO: 409 LOCATION: Fr. Kennedy Gold Gym PAGE NO: 1 of 6 DEPARTMENT & COURSE NO: CHEM 4630 TIME: 3 HOURS

Better Bond Angles in the Protein Data Bank

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Predicting Protein Structural Features With Artificial Neural Networks

Protein Structure Prediction

Supersecondary Structures (structural motifs)

LS1a Fall 2014 Problem Set #2 Due Monday 10/6 at 6 pm in the drop boxes on the Science Center 2 nd Floor

NMR Assignments using NMRView II: Sequential Assignments

Building 3D models of proteins

Computer simulations of protein folding with a small number of distance restraints

Section Week 3. Junaid Malek, M.D.

Protein Science (1997), 6: Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society

Transcription:

Protein Structure Prediction and Display Goal Take primary structure (sequence) and, using rules derived from known structures, predict the secondary structure that is most likely to be adopted by each residue

Secondary structure Alpha helix Secondary structure Beta sheet

Secondary structure Random coil Structural Propensities Due to the size, shape and charge of its side chain, each amino acid may fit better in one type of secondary structure than another Classic example: The rigidity and side chain angle of proline cannot be accomodated in an α-helical structure

Structural Propensities Two ways to view the significance of this preference (or propensity) It may control or affect the folding of the protein in its immediate vicinity (amino acid determines structure) It may constitute selective pressure to use particular amino acids in regions that must have a particular structure (structure determines amino acid) Secondary structure prediction In either case, amino acid propensities should be useful for predicting secondary structure Two classical methods that use previously determined propensities: Chou-Fasman Garnier-Osguthorpe-Robson

Chou-Fasman method Uses table of conformational parameters (propensities) determined primarily from measurements of secondary structure by CD spectroscopy Table consists of one likelihood for each structure for each amino acid Chou-Fasman propensities (partial table) Amino Acid P α P β P t Glu Met 1.51 0.37 0.7 1.5 1.05 0.60 Ala 1.2 0.83 0.66 Val 1.06 1.70 0.50 Ile 1.08 1.60 0.50 Tyr 0.69 1.7 1.1 Pro 0.57 0.55 1.52 Gly 0.57 0.75 1.56

Chou-Fasman method A prediction is made for each type of structure for each amino acid Can result in ambiguity if a region has high propensities for both helix and sheet (higher value usually chosen, with exceptions) Chou-Fasman method Calculation rules are somewhat ad hoc Example: Method for helix Search for nucleating region where out of 6 a.a. have P α > 1.03 Extend until consecutive a.a. have an average P α < 1.00 If region is at least 6 a.a. long, has an average P α > 1.03, and average P α > average P β, consider region to be helix

Garnier-Osguthorpe- Robson Uses table of propensities calculated primarily from structures determined by X-ray crystallography Table consists of one likelihood for each structure for each amino acid for each position in a 17 amino acid window Garnier-Osguthorpe- Robson Analogous to searching for features with a 17 amino acid wide frequency matrix One matrix for each feature α-helix β-sheet turn coil Highest scoring feature is found at each location

Accuracy of predictions Both methods are only about 55-65% accurate A major reason is that while they consider the local context of each sequence element, they do not consider the global context of the sequence - the type of protein The same amino acids may adopt a different configuration in a cytoplasmic protein than in a membrane protein Adaptive methods Neural network methods - train network using sets of known proteins then use to predict for query sequence nnpredict Homology-based methods - predict structure using rules derived only from proteins homologous to query sequence SOPM PHD

Neural Network methods A neural network with multiple layers is presented with known sequences and structures - network is trained until it can predict those structures given those sequences Allows network to adapt as needed (it can consider neighboring residues like GOR) Neural Network methods Different networks can be created for different types of proteins

Neural Network for secondary structure A C D E F G H I K L M N P Q R S T V W Y. D (L) R (E) Q (E) G (E) F (E) V (E) P (E) A (H) A (H) Y (H) V (E) K (E) K (E) H E L Homology-based modeling Principle: From the sequences of proteins whose structures are known, choose a subset that is similar to the query sequence Develop rules (e.g., train a network) for just this subset Use these rules to make prediction for the query sequence

Structural homology It is useful for new proteins whose 3D structure is not known to be able to find proteins whose 3D structure is known that are expected to have a similar structure to the unknown It is also useful for proteins whose 3D structure is known to be able to find other proteins with similar structures Finding proteins with known structures based on sequence homology If you want to find known 3D structures of proteins that are similar in primary amino acid sequence to a particular sequence, can use BLAST web page and choose the PDB database This is not the PDB database of structures, rather a database of amino acid sequences for those proteins in the structure database Links are available to retrieve PDB files

Finding proteins with similar structures to a known protein For literature and sequence databases, Entrez allows neighbors to be found for a selected entry based on homology in terms (MEDline database) or sequence (protein and nucleic acid sequence databases) An experimental feature allows neighbors to be chosen for entries in the structure database Finding proteins with similar structures to a known protein Proteins with similar structures are termed VAST Neighbors by Entrez (VAST refers to the method used to evaluate similarity of structure) VAST or structure neighbors may or may not have sequence homology to each other

PHDse c local alignment 13 adjacent residues global statist. whole protein ::: AAA AA. LLL LII AAG CCS GVV ::: %AA Length ² N-term ² C-term input local in sequence A C L I G S V ins del cons 100 0 0 0 0 0 0 0 0 1.17 100 0 0 0 0 0 0 33 0 0.2 0 0 100 0 0 0 0 0 33 0.92 0 0 33 66 0 0 0 0 0 0.7 66 0 0 0 33 0 0 0 0 1.17 0 66 0 0 0 33 0 0 0 0.7 0 0 0 33 0 0 66 0 0 0.8 input global in sequence percentage of each amino acid in protein length of protein (Š60, Š120, Š20, >20) distance: centre, N-term (Š0,Š30,Š20,Š10) distance: centre, C-term (Š0,Š30,Š20,Š10) input layer 21+3 20 hidden layer output layer H E L +1 20 H E L 0.5 0.1 0. first level sequence-tostructure second level structure-tostructure PHDhtm local alignment 13 adjacent residues ::: AAA AA. LLL LII AAG CCS GVV ::: global %AA statist. Length whole ² N-term protein ² C-term input local in sequence A C L I G S V ins del cons 100 0 0 0 0 0 0 0 0 1.17 100 0 0 0 0 0 0 33 0 0.2 0 0 100 0 0 0 0 0 33 0.92 0 0 33 66 0 0 0 0 0 0.7 66 0 0 0 33 0 0 0 0 1.17 0 66 0 0 0 33 0 0 0 0.7 0 0 0 33 0 0 66 0 0 0.8 input global in sequence percentage of each amino acid in protein length of protein (Š60, Š120, Š20, >20) distance: centre, N-term (Š0,Š30,Š20,Š10) distance: centre, C-term (Š0,Š30,Š20,Š10) input layer 21+3 20 hidden layer output layer HTM non HTM 3+1 20 HTM non HTM first level sequence-tostructure second level structure-tostructure

PHDa c c local alignment 13 adjacent residues global statist. whole protein ::: AAA AA. LLL LII AAG CCS GVV ::: %AA Length ² N-term ² C-term input local in sequence A C L I G S V ins del cons 100 0 0 0 0 0 0 0 0 1.17 100 0 0 0 0 0 0 33 0 0.2 0 0 100 0 0 0 0 0 33 0.92 0 0 33 66 0 0 0 0 0 0.7 66 0 0 0 33 0 0 0 0 1.17 0 66 0 0 0 33 0 0 0 0.7 0 0 0 33 0 0 66 0 0 0.8 input global in sequence percentage of each amino acid in protein length of protein (Š60, Š120, Š20, >20) distance: centre, N-term (Š0,Š30,Š20,Š10) distance: centre, C-term (Š0,Š30,Š20,Š10) input layer output layer 81 hidden layer 6 21+3 9 36 20 25 16 9 2 1 0 first level only Retrieving 3D structures Protein Data Bank (PDB) using web browser home page = http://www.pdb.bnl.gov/ using anonymous FTP Entrez using web browser BLAST using web browser

Displaying Structures with RasMol The GIF image of Ribonuclease A is static - we cannot rotate the molecule or recolor portions of it to aid visualization For this we can use RasMol, a public domain program available for wide range of computers, including MacOS, Windows and Unix Displaying Structures with RasMol Drs. David Hackney and Will McClure have developed an online tutorial for RasMol - a link may be found on the 03-310, 03-311 and 03-510 web pages

PDB files In order to optimally display, rotate and color the 3D structure, we need to download a copy of the coordinates for each atom in the molecule to our local computer The most common format for storage and exchange of atomic coordinates for biological molecules is PDB file format PDB files PDB file format is a text (ASCII) format, with an extensive header that can be read and interpreted either by programs or by people We can request either the header only or the entire file; the next screen requests the header only

http://www.pdb.bnl.gov/pdb-bin/opdbshort http://www.pdb.bnl.gov/pdb-bin/send-pdb?filename=1rat&short=1

http://www.pdb.bnl.gov/pdb-bin/opdbshort