Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Similar documents
Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Template Free Protein Structure Modeling Jianlin Cheng, PhD

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Basics of protein structure

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Selected Publications through 2007 Christodoulos A. Floudas

Supporting Online Material for

Protein structure analysis. Risto Laakso 10th January 2005

Prediction and refinement of NMR structures from sparse experimental data

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy

Ab-initio protein structure prediction

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

Bioinformatics. Macromolecular structure

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Introduction to" Protein Structure

Predicting Peptide Structures Using NMR Data and Deterministic Global Optimization

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Protein Structure Prediction, Engineering & Design CHEM 430

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Physiochemical Properties of Residues

Bio nformatics. Lecture 23. Saad Mneimneh

A Physical Approach to Protein Structure Prediction

Protein Structure Determination

Motif Prediction in Amino Acid Interaction Networks

Implementation of an αbb-type underestimator in the SGO-algorithm

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Modeling for 3D structure prediction

Protein Folding Prof. Eugene Shakhnovich

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein Structure Prediction

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

ALL LECTURES IN SB Introduction

Computer simulations of protein folding with a small number of distance restraints

Iterative Assembly of Helical Proteins by Optimal Hydrophobic Packing

Supersecondary Structures (structural motifs)

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins

CAP 5510 Lecture 3 Protein Structures

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

EBBA: Efficient Branch and Bound Algorithm for Protein Decoy Generation

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Orientational degeneracy in the presence of one alignment tensor.

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Secondary and sidechain structures

Protein Structure Basics

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

The typical end scenario for those who try to predict protein

Predicting Protein Tertiary Structure using a Global Optimization Algorithm with Smoothing

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

A General Model for Amino Acid Interaction Networks

COMBINED MULTIPLE SEQUENCE REDUCED PROTEIN MODEL APPROACH TO PREDICT THE TERTIARY STRUCTURE OF SMALL PROTEINS

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Finding Similar Protein Structures Efficiently and Effectively

Presenter: She Zhang

Minimalist Representations and the Importance of Nearest Neighbor Effects in Protein Folding Simulations

Chemical Shift Restraints Tools and Methods. Andrea Cavalli

CS612 - Algorithms in Bioinformatics

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

D Dobbs ISU - BCB 444/544X 1

Homology models of the tetramerization domain of six eukaryotic voltage-gated potassium channels Kv1.1-Kv1.6

Assignment 2 Atomic-Level Molecular Modeling

Protein Structure Determination Using NMR Restraints BCMB/CHEM 8190

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Introduction to Computational Structural Biology

Crystal Structure Prediction using CRYSTALG program

Analysis and Prediction of Protein Structure (I)

Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach

Structural Alignment of Proteins

Kd = koff/kon = [R][L]/[RL]

Packing of Secondary Structures

Template-Based Modeling of Protein Structure

Lecture 34 Protein Unfolding Thermodynamics

Iterative Assembly of Helical Proteins by Optimal Hydrophobic Packing

BIOCHEMISTRY Unit 2 Part 4 ACTIVITY #6 (Chapter 5) PROTEINS

Protein Structure: Data Bases and Classification Ingo Ruczinski

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

proteins Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field

Better Bond Angles in the Protein Data Bank

PROTEIN'STRUCTURE'DETERMINATION'

Optimisation: From theory to better prediction

CHEM 463: Advanced Inorganic Chemistry Modeling Metalloproteins for Structural Analysis

Building a Homology Model of the Transmembrane Domain of the Human Glycine α-1 Receptor

Improving De novo Protein Structure Prediction using Contact Maps Information

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

NMR, X-ray Diffraction, Protein Structure, and RasMol

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein quality assessment

Computational Protein Design

Transcription:

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and Computational Mathematics Department of Operations Research and Financial Engineering Center for Quantitative Biology

Outline Protein structure prediction overview Predicting α-helical contacts Probability development Model Results Predicting α-helical contacts in α/β proteins Distance bounding Model Results Structure prediction of α-helical proteins Framework Results

Protein Structure Prediction Problem Given an amino acid sequence, identify the three-dimensional protein structure Approaches Homology modeling Fold recognition/threading Fragment assembly First Principles - Optimization Statistical potentials Physics-based potentials TLQAETDQLEDEKSALQ??? Floudas, Biotechnology & Bioengineering, 2007. Floudas. AIChE Journal. 2005, 51:1872-1884. Floudas, et al. Chemical Engineering Science. 2006, 61:966-988.

ASTRO-FOLD Helix Prediction -Detailed atomistic modeling -Simulations of local interactions (Free Energy Calculations) β-sheet Prediction -Novel hydrophobic modeling -Predict list of optimal topologies (Combinatorial Optimization, MILP) Loop Structure Prediction -Dihedral angle sampling -Discard conformers by clustering (Novel Clustering Methodology) Derivation of Restraints -Dihedral angle restrictions -C α distance constraints (Reduced Search Space) Overall 3D Structure Prediction -Structural data from previous stages -Prediction via novel solution approach (Global Optimization and Molecular Dynamics) Klepeis, JL and Floudas, CA. Biophys J. (2003)

Outline Protein structure prediction overview Predicting α-helical contacts Probability development Model Results Predicting α-helical contacts in α/β proteins Distance bounding Model Results Structure prediction of α-helical proteins Framework Results

Overview Problem Topology prediction of globular α-helical proteins Approach Thesis: Topology is based on certain Inter-helical Hydrophobic to Hydrophobic Contacts Create a dataset of helical proteins Develop inter-helical contact probabilities Apply two novel mixed-integer optimization models (MILP) Level 1 - PRIMARY contacts Level 2 - WHEEL contacts McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Dataset Selection Protein Sources 229 PDBSelect25 1 database 62 CATH 2 database 20 Zhang et al. 3 7 Huang et al. 4 Restrictions No β-sheets, at least 2 α-helices No highly similar sequences Dataset 318 proteins in the database set 1 Hobohm, U. and C.Sander. Prot Sci 3 (1994) 522 2 Orengo, C.A. et al. Structure 5 (1997) 1093. 3 Zhang, C. et al. PNAS 99 (2002) 3581. 4 Huang, E.S. et al. J Mol Biol 290 (1999) 267. McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Probability Development Contact Types PRIMARY contact Minimum distance hydrophobic contact between 4.0 Å and 10.0 Å WHEEL contact Only WHEEL position hydrophobic contacts between 4.0 Å and 12.0 Å Classified as parallel or antiparallel contacts McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Model Overview Formulation: Maximize inter-helical residue-residue contact probabilities Binary variable helical contact Binary variable A yh m,n m n w, i, j indicates antiparallel indicates residue contact Goal: Produce a rank-ordered list of the most likely helical contacts Contacts used to restrict conformational space explored during protein tertiary structure prediction McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Pairwise Model Objective Level 1 Objective Maximize probability of pairwise residueresidue contacts McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Pairwise Model Constraints Level 1 Constraints At most one contact per position Helix-helix interaction direction Linking interaction variables McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Pairwise Model Constraints Level 1 Constraints Restrict number of contacts between a given helix pair (MAX_CONTACT) Vary the number of helix-helix interactions (SUBTRACT) McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Pairwise Model Constraints Level 1 Constraints Allow for and Limit helical kinks McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Pairwise Model Constraints Level 1 Constraints Consistent numbering i j k l McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Pairwise Model Constraints Feasible topologies 1 1 m n p

Pairwise Model Objective Level 2 Objective Maximize the sum of predicted wheel probabilities McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Pairwise Model Constraints Level 2 Constraints Require at most one wheel contact for a specified primary contact Level 2 Aim Distinguish between equally likely Level 1 predictions Increase the total number of contact predictions McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Results 2-3 helix bundles PDB:1mbh in PyMol PDB:1nre in PyMol McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Results 1nre Contact Predictions subtract 0, max_contact 2 PRIMARY Contact PRIMARY Distance WHEEL Contact WHEEL Distance Helix-Helix Interaction 25L-49L 6.0 28L-45L 9.1 1-2 A 28L-83V 12.7 - - 1-3 P 45L-85L 9.3 49L-81L 8.1 2-3 A 51I-77L 9.3 - - 2-3 A

Results 1hta Contact Predictions subtract 0, max_contact 1 PRIMARY Contact PRIMARY Distance Helix-Helix Interaction 5I-28L 9.1 1-2 A 46L-62L 8.4 2-3 A

Results Contact Prediction Summary McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Summary Thesis: Topology of alpha helical globular proteins is based on inter-helical hydrophobic to hydrophobic contacts Validated on alpha helical globular proteins

Outline Protein structure prediction overview Predicting α-helical contacts Probability development Model Results Predicting α-helical contacts in α/β proteins Distance bounding Model Results Structure prediction of α-helical proteins Framework Results

Overview Problem α-helical topology prediction of globular α/β proteins Approach Predict/determine the secondary structure and β-sheet topology Establish bounds on inter-residue distances Apply novel optimization model (MILP) to maximize hydropobocity of interhelical interactions Helix Prediction -Detailed atomistic modeling -Simulations of local interactions (Free Energy Calculations) β-sheet Prediction -Novel hydrophobic modeling -Predict list of optimal topologies (Combinatorial Optimization) McAllister and Floudas. 2008, In preparation.

Establishing Distance Bounds Approach Use secondary structure location and β-sheet topology Develop local and non-local bounds (PDBSelect25) Local bounds based on residue separation and secondary structure

Establishing Distance Bounds Non-local Extended β-contacts Cross β-contacts

Tightening Distance Bounds Use of triangle inequality relationships Model is iteratively applied to determine tightest distance bounds

Objective Function Maximize number of hydrophobic interactions between α-helices and hydrophobicity PRIFT scale * Number Hydrophobicity α values are weight factors * Cornette et al. J Mol Biol. 1987, 195:659-85.

Constraints Residue contact constraints Residue i forms at most one contact with residue in helix n Residue i forms at most two contact Additional constraints limiting the size of allowed helix kinks Similar to constraints for α-helical topology prediction of α-helical proteins

Constraints Residue contact constraints Disallow (i,i+2), (i,i+5), and (i,i+6) residue pairs from both having contacts with helix n These residues exist on opposite faces of a helix i i+2 i+5 i+6

Constraints Helix contact constraints Maximum of 2 helix-helix contacts for a helix Only 1 helix-helix interaction direction Ensure feasible topologies Similar to constraints for α-helical topology prediction of α-helical proteins

Constraints Relating residue contacts to helix contacts Ensure consistent numbering i j k l

Constraints Relating distances to residue contacts If residue pair (i,j) forms an inter-helical contact, then d ij falls within contact distance If residue pair (i,j) does not form an inter-helical contact, d ij falls beyond contact upper bound

Constraints Distance constraints Satisfies initial bounds Satisfies triangle inequality constraints

Constraints Distance constraints Restrict distances based on right angle interaction assumption If residue pair (i,k) is an interhelical contact, line segment (i,k) is perpendicular to line segment (i,j) Relationship is used to bound the distance d jk

Results 1dcjA 1o2fB

Results 1bm8

Results - Summary Best average contact distance for 11 of 12 proteins in the test set was less than 11.0 Angstroms

Results CASP7 T350 Prediction with the optimal topology is shown

Outline Protein structure prediction overview Predicting α-helical contacts Probability development Model Results Predicting α-helical contacts in α/β proteins Distance bounding Model Results Structure prediction of α-helical proteins Framework Results

ASTRO-FOLD for α-helical Bundles Helix Prediction -Detailed atomistic modeling -Simulations of local interactions (Free Energy Calculations) Interhelical Contacts -Maximize common residue pairs -Rank-order list of topologies (MILP Optimization Model) Loop Structure Prediction -Dihedral angle sampling -Discard conformers by clustering (Novel Clustering Methodology) Derivation of Restraints -Dihedral angle restrictions -C α distance constraints (Reduced Search Space) Overall 3D Structure Prediction -Structural data from previous stages -Prediction via novel solution approach (Global Optimization and Molecular Dynamics) McAllister, Floudas. Proceedings, BIOMAT 2005.

Derivation of Restraints Dihedral angle restraints For residues with α-helix or β- sheet classification For loop residues using the best identified conformer from loop modeling efforts Distance restraints Helical hydrogen bond network (i,i+4) α-helical topology predictions β-sheet topology predictions Klepeis, JL and Floudas, CA. Journal of Global Optimization. (2003)

Constrained optimization Problem definition Atomistic level force field (ECEPP/3) Distance constraints

Tertiary Structure Prediction Hybrid global optimization approach αbb deterministic global optimization Conformational Space Annealing (CSA) Modifications/Enhancements Improved initial point selection using a torsion angle dynamics based annealing procedure from CYANA Inclusion of a rotamer optimization stage for quick energetic improvements Streamlined parallel implementation

αbb Global Optimization Based on a branch-and-bound framework Upper bound on the global solution is obtained by solving the full nonconvex problem to local optimality Lower bound is determined by solving a valid convex underestimation of the original problem Convergence is obtained by successive subdivision of the region at each level in the brand & bound tree Guaranteed ε-convergence for C 2 NLPs Floudas, CA and co-workers, 1995-2007 Adjiman, CS, et al. Computers and Chemical Engineering. (1998a,b)

Conformational Space Annealing Induce variations Mutations Crossovers Subject to local energy minimization Anneal through the gradual reduction of space Scheraga and co-workers, 1997-2007. Lee, JH, et al. Journal of Computational Chemistry. (1997)

Rotamer Side Chain Optimization Side chain packing is crucial to the stability and specificity of the native state Rotamer optimization is a quick way to alleviate steric clashes Better starting point for constrained nonlinear minimization

Torsion Angle Dynamics Why? Difficult to identify low energy feasible points Fast evaluation of steric based force field Unconstrained formulation with penalty functions Implemented by solving equations of motion as preprocessing for each constrained minimization Guntert, P, et al. Journal of Molecular Biology. (1997) Klepeis, JL, et al. Journal of Computational Chemistry. (1999) Klepeis, JL and Floudas, CA. Computers and Chemical Engineering. (2000)

Hybrid Global Optimization Algorithm All secondary nodes begin performing αbb iterations Once the CSA bank is full, CSA takes control of a subset of secondary nodes αbb CSA work Torsion Rotamerangle optimization dynamics αbb Control CSA Control Idle Work Rotamer Minimization optimization of CSA trial conformation Minimization of lower bounding function Minimization of upper bounding function CSA αbb Idle work control Maintains Performs shear CSA list of bank movements lower bounding and subregions perturbations on CSA Maintains Tracks structures overall queue upper of αbb and minima lower bounds for bank increases Handles Only Defines executed branching bank updates during directions idle time of primary processor Sends and receives work to and from CSA αbb work nodes Primary processor Secondary processors McAllister and Floudas. 2007, Submitted for publication.

Results Tertiary Structure Prediction PDB: 1nre Energy -1395.48 RMSD 6.63 Energy -1340.45 RMSD 3.52 Lowest energy predicted structure of 1nre (color) versus native 1nre (gray) Lowest RMSD predicted structure of 1nre (color) versus native 1nre (gray)

Results Tertiary Structure Prediction PDB: 1hta Energy -941.02 RMSD 6.70 Energy -915.57 RMSD 2.58 Lowest energy predicted structure of 1hta (color) versus native 1hta (gray) Lowest RMSD predicted structure of 1hta (color) versus native 1hta (gray)

Results Blind Tertiary Structure Prediction (Collaboration with Michael Hecht) S836 Energy -1740.11 RMSD 2.84 Energy 1697.88 RMSD 2.39 Lowest energy predicted structure of s836 (color) versus native s836 (gray) Lowest RMSD predicted structure of s836 (color) versus native s836 (gray)

Conclusions Two novel mixed-integer linear programming models were developed for α-helical topology prediction in α-helical proteins PRIMARY and WHEEL contacts For all 26 test α-helical proteins, best average contact distance predictions fell well below 11.0 Å A novel mixed-integer linear programming model was aslo developed for α-helical topology prediction in α/β proteins For 11 of 12 test α/β proteins, best average contact distance predictions fell below 11.0 Å Topology predictions were useful for restraining the tertiary structures during global optimization and obtaining a near-native predictions in a blind study

Acknowledgements Funding sources National Institutes of Health (R01 GM52032) US EPA (GAD R 832721-010)* *Disclaimer: This work has not been reviewed by and does not represent the opinions of the funding agency.

Questions