Protein Structure Prediction Michael Feig MMTSB/CTBP 2006 Summer Workshop
From Sequence to Structure SEALGDTIVKNA
Ab initio Structure Prediction Protocol Amino Acid Sequence Conformational Sampling to generate native-like structures Scoring & Clustering to identify most native-like structures 3D Structure
Folding with All-Atom Models AAQAAAAQAAAAQAA CHARMM force field Implicit solvent replica exchange simulations 8 replicas, 10 ns/replica
Folding with Low-Resolution Model EQQNAFYEILHLPNLNEEQRNGFIQSLKDDPSQSANLLAEAKKLNDAQA SICHO model, MONSSTER simulated annealing run
SICHO Lattice Model Monte Carlo simulations: Thr Leu > Attempt move > Compute ΔE Phe > Accept with probability p: Asp %1 "E # 0 p =& 'exp($"e / k BT ) "E > 0 Simulated annealing! MMTSB/CTBP Summer Workshop Constant Temperature Replica Exchange Sampling Kolinski & Skolnick: Proteins 32, 475 (1998) Michael Feig, 2006.
Excluded volume SICHO Energy Function Knowledge-Based Terms Side chain burial propensity follows Kyte-Doolittle scale Centrosymmetric bias r g = 2.2 N 0.38 res Ala Arg Asp Ile 1.8-4.5-3.5 4.5
SICHO Energy Function Statistical Terms Potential of mean force (PMF): p " #E ij i kbt = e p j energy 2.5 2.0 1.5 1.0 0.5 0.0-0.5-1.0-1.5 "E = #kt ln(p) extended 0 1 2 3 4 5 6 7 8 9 10 11 GLU-GLU r(i,i+4) in Å helix
Conformational Sampling with SICHO All-Atom Energy in kcal/mol Protein A -3200-3250 -3300-3350 -3400 1 2 3 4 5 6 7 8 9 10 11 12 RMSD from native in Å MMTSB/CTBP Summer Workshop Michael Feig, 2006.
Ab initio Structure Prediction Protocol Amino Acid Sequence Secondary Structure Prediction PSIPRED et al. Efficient Sampling e.g. MONSSTER/SICHO All-Atom Reconstruction Scoring & Clustering e.g. MMGB/SA, DFIRE 3D Structure
Secondary Structure Prediction GDPIVKNAKLDSRLANKEALRLL?
Secondary Structure Propensities α-helix β-sheet turn Chou & Fasman (1974)
Secondary Structure Prediction Methods SABLE PSIPRED PSSP SAM-T99-sec PHD C+F 77.6% 76.2% 75.1% 76.1% 72.3% 50-60% C+F: Chou & Fasman GOR: Garnier, Osguthorpe, Robson
Secondary structure prediction Per-Residue Accuracy
Realistic ab initio Structure Prediction -2400-2450 CHARMM22/GBMV. -2500-2550 -2600-2650 -2700-2750 6 7 8 9 10 11 12 13 14 C! RMSD in Å (62 residues)
Sampling, Scoring, Clustering -2400-2450 CHARMM22/GBMV. -2500-2550 -2600-2650 -2700-2750 6 7 8 9 10 11 12 13 14 C! RMSD in Å (62 residues)
Ab initio Predictions DNase fragmentation factor NMR structure 1KOY Best-scoring prediction 7.4 Å RMSD
Scoring Functions Knowledge-based/statistical derived from known protein structures limited by training data usually fast e.g. DFIRE, RAPDF, prosaii Force field based model physical energy landscape more robust and transferable often expensive (require minimization) e.g. MMPB(GB)/SA, UNRES
-3500 Scoring Function Comparison MMGB/SA vs. DFIRE -8000 MMGB/SA score -3600-3700 -3800-3900 -4000-4100 DFIRE score -8500-9000 -9500-10000 -4200-10500 5.5 6 6.5 7 7.5 8 8.5 5.5 6 6.5 7 7.5 8 8.5 C! RMSD in Å 15.8 15.6 15.4 15.2 15 14.8 14.6 radius of gyration C! RMSD in Å 14.4 14.2 5.5 6 6.5 7 7.5 8 8.5 C! RMSD in Å
Sampling with Restraints Secondary structure bias Secondary structure prediction NMR shift data Distance restraints Experimental restraints (disulfides, NMR, EPR) Side chain contacts from analogous structures Shape restraints cryoem data, small-angle X-ray scattering
but the solution may lie elsewhere.
Sequence Homology Human thioredoxin (1AUC) SDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTV A : :....:..::: : :::::::: :.....:.... MVKQIESKTAFQEALDAAGDKLVVVDFSATWCGPCKMIKPFFHSLSEKYSNVIFL - KLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDAN---LA...:..:....::..::.:. :::.: : :: :.:. :. EVDVDDCQDVASECEVKCMPTFQFFKKGQ----KVGEFS-GANKEKLEATINELV E. Coli thioredoxin (1THO)
Comparative Modeling Assumption: Proteins with similar sequence have similar structure Human thioredoxin (1AUC) E. Coli thioredoxin (1THO)
Structural Templates from Homology Challenges: Correct alignment Loop modeling Side chain rebuilding PGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDAN---LA.:....::..::.:. :::.: : :: :.:. :. QDVASECEVKCMPTFQFFKKGQ----KVGEFS-GANKEKLEATINELV
Accuracy of Predictions by Homology 100 % predictions 80 60 40 20 0 20 30 40 50 60 70 80 90 100 % sequence identity <2Å RMSD <5Å RMSD
Prediction through Fold Recognition Assumption: Proteins with similar secondary structure share fold 1N91 1JRM
Templates through Fold Recognition Challenges: Wrong templates Alignment uncertain Fragment modeling Refinement needed
Ab initio Sampling in Template-based Structure Prediction Template provides known protein structure Ab initio sampling of unknown fragments in the context of template
Template Restraints Near Flexible Part 0.0 0.1 0.4 0.7 0.2 1.0 Restraint potential: U = f " k (r # r 0 ) 2
Loop Sampling Methods # Residues 1-2 1-3 2-12 5-30 2-100 Sampling All-Atom Reconstruction Exhaustive Search Torsional Space MC/MD Multi-Scale Fragment-based Program MMTSB Tool Set Modeller (Sali) MMTSB Tool Set Rosetta (Baker)
Structural Genomics Efforts
Structure Refinement? predicted native (NMR)