Human and Server CAPRI Protein Docking Prediction Using LZerD with Combined Scoring Functions. Daisuke Kihara

Human and Server CAPRI Protein Docking Prediction Using LZerD with Combined Scoring Functions Daisuke Kihara Department of Biological Sciences Department of Computer Science Purdue University, Indiana, USA http://kiharalab.org 1

CAPRI Round 30 Results (Lensink et al., CAPRI30 group paper, 2016) 2

Overview of Protein Docking Prediction Using LZerD in CAPRI Single Chain Modeling HHPred MUFold TASSER SparksX TASSERlite Phyre2 MultiCom PRESCO Sub-unit models Re-ranking with scoring functions LZerD 10 models ~50,000 docking models Clustering, RMSD < 5 Å MD relaxation Submit 3

(Lizard) LZerD(Local 3D Zernike descriptor-based Docking program) normal vector 6Å 3DZernike descriptor (Venkatraman, Yang, Sael, & Kihara, BMC Bioinformatics, 2009) 4

3D Zernike Descriptors (3DZD) An extension of spherical harmonics based descriptors A 3D object can be represented by a series of orthogonal functions, thus practically represented by a series of coefficients as a feature vector Compact Rotation invariant Z m nl m ( r,, ) R ( r) Y (, ) nl Y (, ) (r) m l Z m nl ( r,, ) l R nl : Spherical harmonics, : radial functions polynomials in Cartesian coordinates Zernike moments: Zernike Descriptor: m nl Fnl 3 m 4 f ( x) Z ( x) dx. nl x 1 m l ml ( m nl ) 2 A surface representation of 1ew0A (A) is reconstructed from its 3D Zernike invariants of the order 5, 10, 15, 20, and 25 (B-F). (Sael & Kihara, 2009) 5

Protein Residue Environment SCOre (PRESCO) Center along the main-chain within a sphere of 6 or 8 Å (Kim & Kihara, Proteins 2014) 6

Finding Similar Side-Chain Depth Environment (SDE) from a database Query SDE surface Structure Database 2536 proteins 500 lowest RMSD fragments of 9 side-chain centroids; Superimposed with the query fragment Select SDE with the same number of side-chain centroids in the sphere of 8.0Å Compute RMSD of residuedepth for corresponding side-chain centroids Sort by depth RMSD to the query 7

CASP11 Free Modeling Category Ranking (Model 1) (http://www.predictioncenter.org/casp11/zscores_final.cgi?formula=assessors) (Kim & Kihara, Proteins 2015) 8

DFIRE, GOAP, ITScore Scoring Functions DFIRE (Yaoqi Zhou): statistical distancedependent atom contact potential using the finite ideal-gas reference state GOAP (Jeff Skolnick): DFIRE * orientation dependent term ITScore (Xiaoqin Zou):iteratively refined statistical distance-dependent atom contact potential 9

The BindML Algorithm (La D, & Kihara D, Proteins 2012) 10

Generating Substitution Models ipfam (505 Families) Model Model 11

ipfam Dataset Benchmark ROC based on 449 Protein Complexes 12

BindML Webserver http://kiharalab.org/bindml (Wei Q, La D, & Kihara D, Methods in Mol.Biol. In press 2016) 13

T79 (Round 30) (Interface 2) Kihara: 3 hits; LZerD: 1 hit Homodimer LZerD runs: No-interface prediction With BindML-consPPISP prediction LZerD selection strategy: Consensus of ITScore and GOAP 5 from no-interface, 5 from BindML-consPPISP Kihara selection strategy: Manual combination of ITScore, GOAP, DFIRE, and PRESCO 10 from no-interface 14

T79 Subunit Model Quality Chain A RMSD: 4.0 Å Chain B RMSD: 4.0 Å native model 15

T79 Human Selected Model fnat 0.16, L-RMSD 14.1Å, i-rmsd 3.8 Å native model 16

T79 Interface Prediction Method Precision Recall F-Score BindML 0 0 NA Cons-PPISP 0.10 0.18 0.12 17

irmsd LRMSD fnat T79 Scores (no-interface prediction) GOAP DFIRE ITScore 18

ITScore DFIRE GOAP T79 Score Comparison GOAP DFIRE ITScore 19

lrmsd T79 PRESCO scores With Inteface Prediction Without Interface Prediction PRESCO PRESCO 20

T79 Score performance summary Run Score RFH Hits in top 10 nointerface ITScore 1 (62) 3 nointerface GOAP 1 (72) 3 nointerface DFIRE 1 (111) 5 BindMLconsPPISP all - - RFH: rank of first acceptable (medium) hit 21

T91 (Round 30) Kihara: 8 hits; LZerD: 2 hits Homodimer LZerD runs: No-interface prediction (with our monomer model) With BindML+consPPISP interface prediction Zhang1 CASP server model, no-interface prediction Server selection strategy 10 from no-interface Human selection strategy Consensus of ITScore, GOAP, PRESCO, and visual inspection 5 from no-interface, 5 from Zhang1 22

T91 Subunit Models Chain C Our model: RMSD 6.0 Å Zhang: RMSD 4.9 Å Chain D Our model RMSD 6.5 Å Zhang: RMSD 5.7 Å native Our model Zhang1 23

T91 Human Selected Model model native fnat 0.33, L-RMSD 9.0 Å, I-RMSD 4.2 Å 24

T91 Interface Prediction Method Precision Recall F-Score BindML 0.64 0.20 0.30 Cons-PPISP 0.50 0.28 0.36 25

irmsd LRMSD fnat T91 Score (no interface prediction) GOAP DFIRE ITScore 26

irmsd LRMSD fnat T91 Scores (With Interface prediction) GOAP DFIRE ITScore 27

T91 Scores (Zhang models) LRMSD fnat irmsd GOAP DFIRE ITScore 28

ITScore DFIRE GOAP T91 Zhang1 Score Comparison GOAP DFIRE ITScore 29

LRMSD T91 PRESCO Scores Docking with Zhang models Without Interface Prediction PRESCO PRESCO Top 5 models selected from each 30

T91 Score Performance Summary Run Score RFH Hits in top 10 nointerface ITScore 2 2 nointerface GOAP 2 1 nointerface DFIRE 1 2 interface ITScore 1042 0 interface GOAP 165 0 interface DFIRE 116 0 zhang1 ITScore 1 (4) 5 zhang1 GOAP 2 (16) 5 zhang1 DFIRE 1 (6) 6 RFH: rank of first acceptable (medium) hit 31

T96 (Round 31) Heterodimer Predictor hits: 0 (5 by other groups) Scorer hits: human 1, server 0 (1 by other group) Human: 6 selected by PRESCO, 4 selected from with predicted interface, ITScore, GOAP, DFIRE No PDB file for the native structure available: metrics computed using two scorer hits (average L-RMSD/I-RMSD, max fnat) 32

T96 scorer hits Chain B S31.M06 (Kihara) fnat 0.32 L-RMSD 7.99 Å I-RMSD 2.67 Å Chain B S39.M03 (Haliloglu) fnat 0.22 L-RMSD 5.68 Å I-RMSD 2.44 Å Chain A 33

T96 interface prediction Chain Method Precision Recall F-score A BindML 0.15 0.2 0.17 Cons-PPISP 0 0 NA B BindML 0.12 0.11 0.12 Cons-PPISP* NA NA NA *Cons-PPISP predictions were only for the N-terminal tail; visual inspection suggests that N-terminal tail is not a likely a binding site, so these predictions were not used. 34

irmsd lrmsd fnat T96 Scorer-Models Scores GOAP DFIRE ITScore 35

T96 Score Performance Summary Score RFH Hits in top 10 ITScore 529 0 GOAP 6 1 DFIRE 125 0 RFH: rank of first acceptable hit The hit for GOAP/DFIRE is the same model picked by PRESCO 36

Summary Our docking prediction procedure runs LZerD, and decoys were selected by combining DFIRE, ITScore, GOAP, and PRESCO. Binding sites were predicted by BindML and cons-ppisp. On the examples shown, PRESCO s performance was not as spectacular as we expected from its performance on single chain str. prediction. DFIRE, ITScore, GOAP showed similar, reasonably good performance. Scoring functions performance depends on subunit model quality. The way to use BindML prediction needs to be improved. 37

Lab Members Hyung- Rae Kim Lenna Peterson @kiharalab 38