clustq: Efficient Protein Decoy Clustering Using Superposition-free Weighted Internal Distance Comparisons

Similar documents
Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

Using Phase for Pharmacophore Modelling. 5th European Life Science Bootcamp March, 2017

On the Importance of the Distance Measures Used to Train and Test Knowledge-Based Potentials for Proteins

Bio nformatics. Lecture 23. Saad Mneimneh

Protein Structure Prediction, Engineering & Design CHEM 430

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Supporting Online Material for

proteins 3Drefine: Consistent protein structure refinement by optimizing hydrogen bonding network and atomic-level energy minimization

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein quality assessment

Protein Structure Prediction

Generalized ensemble methods for de novo structure prediction. 1 To whom correspondence may be addressed.

Markov State Models. Gregory R. Bowman Miller Fellow University of California, Berkeley

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Protein Structure Prediction 11/11/05

Protein Structure Prediction

In order to compare the proteins of the phylogenomic matrix, we needed a similarity

Evolutionary design of energy functions for protein structure prediction

The typical end scenario for those who try to predict protein

A New Hidden Markov Model for Protein Quality Assessment Using Compatibility Between Protein Sequence and Structure

Finding Similar Protein Structures Efficiently and Effectively

Protein Structure Determination

Lecture 18 Generalized Belief Propagation and Free Energy Approximations

PROTEIN STRUCTURE PREDICTION II

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization

Protein Threading. BMI/CS 776 Colin Dewey Spring 2015

Building 3D models of proteins

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

proteins Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field

Multiple Mapping Method: A Novel Approach to the Sequence-to-Structure Alignment Problem in Comparative Protein Structure Modeling

Ab-initio protein structure prediction

Contact map guided ab initio structure prediction

TASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6

Prediction and refinement of NMR structures from sparse experimental data

Computational Biology From The Perspective Of A Physical Scientist

Improving De novo Protein Structure Prediction using Contact Maps Information

Energy functions and their relationship to molecular conformation. CS/CME/BioE/Biophys/BMI 279 Oct. 3 and 5, 2017 Ron Dror

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

ALL LECTURES IN SB Introduction

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

Structural biomathematics: an overview of molecular simulations and protein structure prediction

Protein Structure Prediction

Template-Based Modeling of Protein Structure

Identification of correct regions in protein models using structural, alignment, and consensus information

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

CAP 5510 Lecture 3 Protein Structures

Unfolding CspB by means of biased molecular dynamics

Topological Data Analysis for Brain Networks

Molecular Interactions F14NMI. Lecture 4: worked answers to practice questions

Protein Structure Prediction and Protein-Ligand Docking

BIOINFORMATICS TOOLS & ANALYSIS OF PROTEIN STRUCTURE AND FUNCTION FEI JI. (Under the Direction of Ying Xu) ABSTRACT

As of December 30, 2003, 23,000 solved protein structures

Docking. GBCB 5874: Problem Solving in GBCB

proteins Estimating quality of template-based protein models by alignment stability Hao Chen 1 and Daisuke Kihara 1,2,3,4 * INTRODUCTION

proteins Effect of using suboptimal alignments in template-based protein structure prediction Hao Chen 1 and Daisuke Kihara 1,2,3 * INTRODUCTION

Search Strategies in Structural Bioinformatics

Protein Structure Analysis with Sequential Monte Carlo Method. Jinfeng Zhang Computational Biology Lab Department of Statistics Harvard University

Ligand Scout Tutorials

Computational methods for predicting protein-protein interactions

AB initio protein structure prediction or template-free

Observing Dark Worlds

Human and Server CAPRI Protein Docking Prediction Using LZerD with Combined Scoring Functions. Daisuke Kihara

Energy functions and their relationship to molecular conformation. CS/CME/BioE/Biophys/BMI 279 Oct. 3 and 5, 2017 Ron Dror

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

Multiple Alignment using Hydrophobic Clusters : a tool to align and identify distantly related proteins

A Principled Comparative Analysis of Dimensionality Reduction Techniques on Protein Structure Decoy Data

Basic Local Alignment Search Tool

Generalized Pattern Search Algorithm for Peptide Structure Prediction

Introduction to Evolutionary Concepts

GOAP: A Generalized Orientation-Dependent, All-Atom Statistical Potential for Protein Structure Prediction

Bridging efficiency and capacity of threading models.

A Physical Approach to Protein Structure Prediction

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Improved Beta-Protein Structure Prediction by Multilevel Optimization of NonLocal Strand Pairings and Local Backbone Conformation

Grundlagen der Bioinformatik, SS 08, D. Huson, May 2,

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

proteins Prediction Methods and Reports

Aggregating Ordinal Labels from Crowds by Minimax Conditional Entropy. Denny Zhou Qiang Liu John Platt Chris Meek

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Believe it Today or Tomorrow? Detecting Untrustworthy Information from Dynamic Multi-Source Data

Distances & Similarities

proteins Refinement by shifting secondary structure elements improves sequence alignments

Bayesian Models and Algorithms for Protein Beta-Sheet Prediction


Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?

Decoupled Collaborative Ranking

TOUCHSTONE: A Unified Approach to Protein Structure Prediction

Atomic-Level Protein Structure Refinement Using Fragment-Guided Molecular Dynamics Conformation Sampling

The Universal Similarity Metric, Applied to Contact Maps Comparison in A Two-Dimensional Space

Research Article Basin Hopping as a General and Versatile Optimization Framework for the Characterization of Biological Macromolecules

Transcription:

clustq: Efficient Protein Decoy Clustering Using Superposition-free Weighted Internal Distance Comparisons Debswapna Auburn University ACM-BCB August 31, 2018

What is protein decoy clustering? Clustering groups similar set of items together in clusters folding simulation Identify most populated conformational states 2

3

Clustering and protein folding landscape Energy Entropy N There are more ways to be incorrect than to be correct 4

Clustering based nativeness score? potential near-native basin highest average pairwise similarity O(n 2 ) pairwise comparisons to compute the average pairwise similarity score 5

Pairwise comparison via structural alignment structural alignement optimal structural alignment is an optimization problem similarity score TMscore, GDT, O(n 2 ) alignment based comparisons is computationally expensive 6

Q-score: Alignment-free comparison {rij 1 }: internal distance matrix protein 1 {rij 2 }: internal distance matrix protein 2 Qij = exp [-(rij 1 - rij 2 )^2 ] Qij ~ 1 for very good similarity Qij ~ 0 for poor similarity 7 Sussman and coworkers, 2009

WQ-score: Weighted internal distance comparison Qnarrow: i - j < 6 Qshort: i - j 6 and i - j < 12 Qmedium: i - j 12 and i - j < 24 sequence separations are inspired from protein contact map prediction Qlong: i - j 24 WQ-score = (1 x Qnarrow + 2 x Qshort + 4 x Qmedium + 8 x Qlong) / 15 long range interactions carry more information about protein fold 8

clustq: What s conceptually new? rapid clustering based consensus scoring using alignment-free average pairwise WQ-score 9

Results 1/4: WQ-score vs. alignment-based scores comparisons with popular alignment-based scores datsets measures Modeller set (20 proteins) Rosetta set (58 proteins) TMscore GDT-TS Pearsons correlation Spearman correlation 10

Modeller set very well correlated (> 0.97) with alignment-based scores 11

Rosetta set well correlated (~0.8) with alignment-based scores 12

Results 2/4: clustq vs. alignment-based consensus scoring clustering based consensus scoring with alignment-based scores "stage 2" datasets CASP11 (80 proteins) CASP12 (40 proteins) TMscore GDT-TS measures Pearsons correlation Spearman correlation 13

CASP11 "stage 2" set better than TMscore based clustering comparable to GDT-TS based clustering 14

CASP12 "stage 2" set comparable to alignment-based consensus scoring 15

Computational Efficiency of clustq vs. TMscore CASP11 CASP12 16

clustq is 5.2 times faster than alignment based consensus scoring 17

Results 3/4: clustq vs. top consensus based approaches top consensus scoring full datasets CASP11 (80 proteins) CASP12 (40 proteins) Pcons APOLLO measures Pearsons correlation Spearman correlation 18

Results 3/4: clustq vs. top consensus based approaches Pearson Spearman CASP11 CASP12 CASP11 CASP12 0.9 0.8 Avg. Pearson correlation w.r.t. GDT-TS 0.825 0.75 0.675 Avg. Pearson correlation w.r.t. GDT-TS 0.725 0.65 0.575 0.6 clustq Pcons APOLLO 0.5 clustq Pcons APOLLO consistently better performance compared to top methods 19

can clustq score estimate target difficulty? 20

Results 4/4: Computational Efficiency of clustq vs. TMscore CASP11 CASP12 21

if clustq_score > 0.4: easy target (homology-based) else: hard target (homology-free) 22

clustq online http://watson.cse.eng.auburn.edu/clustq/ 23

Conclusions alignment-free weighted internal distance comparison metric well correlated with alignment-based metrics ultra-fast clustering based consensus scoring comparable or better performance could be employed for estimating target difficulty freely available to the community 24

Acknowledgements Rahul Alapati Auburn University 25

clustq http://watson.cse.eng.auburn.edu/clustq/ 26