Goals. Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions

Similar documents
Structural analysis of the EGR family of transcription factors: Templates for predicitng protein - DNA internations

SI Materials and Methods

Bioengineering & Bioinformatics Summer Institute, Dept. Computational Biology, University of Pittsburgh, PGH, PA

Truncated Profile Hidden Markov Models

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Modelling of Possible Binding Modes of Caffeic Acid Derivatives to JAK3 Kinase

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Computational Modeling of Protein Kinase A and Comparison with Nuclear Magnetic Resonance Data

Sequence Based Bioinformatics

Structure to Function. Molecular Bioinformatics, X3, 2006

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Supplementary Figures:

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Week 10: Homology Modelling (II) - HHpred

Cooperativity and Specificity of Cys 2 His 2 Zinc Finger Protein-DNA Interactions: A Molecular Dynamics Simulation Study

Introduction to" Protein Structure

STRUCTURAL BIOINFORMATICS II. Spring 2018

SUPPLEMENTARY INFORMATION

CS612 - Algorithms in Bioinformatics

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

A conserved P-loop anchor limits the structural dynamics that mediate. nucleotide dissociation in EF-Tu.

Why Proteins Fold? (Parts of this presentation are based on work of Ashok Kolaskar) CS490B: Introduction to Bioinformatics Mar.

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Chemical Shift Restraints Tools and Methods. Andrea Cavalli

CAP 5510 Lecture 3 Protein Structures

Francisco Melo, Damien Devos, Eric Depiereux and Ernest Feytmans

Prediction and refinement of NMR structures from sparse experimental data

Life Science Webinar Series

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Basics of protein structure

CSCE555 Bioinformatics. Protein Function Annotation

Protein Structure Prediction, Engineering & Design CHEM 430

Structure Investigation of Fam20C, a Golgi Casein Kinase

Docking. GBCB 5874: Problem Solving in GBCB

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

We used the PSI-BLAST program ( to search the

Supporting Information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Monte Carlo Simulations of Protein Folding using Lattice Models

Computational Molecular Biology

Signaling Proteins: Mechanical Force Generation by G-proteins G

STRUCTURAL BIOINFORMATICS I. Fall 2015

β1 Structure Prediction and Validation

Protein Structure Analysis with Sequential Monte Carlo Method. Jinfeng Zhang Computational Biology Lab Department of Statistics Harvard University

A profile-based protein sequence alignment algorithm for a domain clustering database

Absolute Entropy of a 2D Lattice Model for a Denatured Protein

Supporting Information How does Darunavir prevent HIV-1 protease dimerization?

Computational Biology: Basics & Interesting Problems

Presenter: She Zhang

Objectives. Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain 1,2 Mentor Dr.

Theory and Applications of Residual Dipolar Couplings in Biomolecular NMR

Bioinformatics. Macromolecular structure

Molecular modeling. A fragment sequence of 24 residues encompassing the region of interest of WT-

Supplementary Methods

Computational modeling of G-Protein Coupled Receptors (GPCRs) has recently become

Building a Homology Model of the Transmembrane Domain of the Human Glycine α-1 Receptor

Better Bond Angles in the Protein Data Bank

Gürol M. Süel, Steve W. Lockless, Mark A. Wall, and Rama Ra

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Modeling for 3D structure prediction

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Rex-Family Repressor/NADH Complex

Overview & Applications. T. Lezon Hands-on Workshop in Computational Biophysics Pittsburgh Supercomputing Center 04 June, 2015

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Protein Structure Determination

Focus on PNA Flexibility and RNA Binding using Molecular Dynamics and Metadynamics

GC and CELPP: Workflows and Insights

SUPPLEMENTARY MATERIAL. Supplementary material and methods:

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Introduction to Computational Structural Biology

Protein Secondary Structure Prediction

MBLG lecture 5. The EGG! Visualising Molecules. Dr. Dale Hancock Lab 715

Bioengineering 215. An Introduction to Molecular Dynamics for Biomolecules

Protein Structure Prediction

Protein Structures: Experiments and Modeling. Patrice Koehl

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

Molecular Modeling lecture 2

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Supporting Online Material for

Protein Structure Prediction

Tu 1,*, , Sweden

Protein Structure: Data Bases and Classification Ingo Ruczinski

Optimization and Frustration:

Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space

Detection of Protein Binding Sites II

Bayesian Inference of Protein and Domain Interactions Using The Sum-Product Algorithm

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Protein Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Medical Research, Medicinal Chemistry, University of Leuven, Leuven, Belgium.

Analysis and Prediction of Protein Structure (I)

Transcription:

Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions Jamie Duke 1,2 and Carlos Camacho 3 1 Bioengineering and Bioinformatics Summer Institute, Department of Computational Biology, University of Pittsburgh, Pittsburgh, PA 15261 2 Department of Biological Sciences, Rochester Institute of Technology, Rochester, NY 14623 3 Department of Computational Biology, University of Pittsburgh, Pittsburgh, PA 15261 Goals Investigate the diversity of the EGR family of proteins Carry out homology modeling between resolved structures and known human EGR proteins Test the structures with protein DNA docking algorithms to determine the specific protein DNA interactions 1

Background Information Zinc Fingers Nucleic Acid binding domain Classic C2H2 conformation coordinating a zinc ion Conserved Pattern: x-c-x(1-5)-c-x(12)-h-x(3-6)-h Conserved aromatic ring 24 residue β β α motif Multiple domains used to recognize specific DNA sequences Most commonly studied family is EGR family with 2 3 zinc finger domains Also known as Zif268, Nerve Growth Factor Induced Protein, and Krox proteins Referenced from Pfam Acc. No: PF00096 (http://www.sanger.ac.uk/cgi-bin/pfam/getacc?pf00096) Zinc Finger Binding Each Finger recognizes 3 nucleotides Recognition occurs in the α-helix of the finger Recognition is overlapped by the 3 domains DNA binding site can be changed with mutation to the protein 5 1AAY 1G2D Finger 3 Finger 2 Finger 1 6 3-1 6 3-1 6 3-1 R T E TR T Q H G QR T R N E QR N N G C GT TA GT GA GA CA GA T G N N N C G C A A T C A C T C T G T C T A C N -2-1 1 2 3 4 5 6 7 8 9 1011 3 5 Paillard et al. Fig 1A and 1B 3 2

zf-c2h2 Family Diversity Currently, there are 32,874 identified zinc fingers of the type zf-c2h2 (Pfam 17.0) There are 5264 proteins with identified zinc fingers, which are represented in 235 different architectures Distribution: Eukaryota: 5233 proteins Vertebrata: 3435 proteins Amphibians: 218 protiens Humans: 1390 proteins Mice: 1085 Fungi: 395 proteins Viruses: 19 proteins Archea: 12 proteins zf-c2h2 MSA Snapshot of the multiple sequence alignment for the domain (* conserved residue) EGR1_HUMAN/396-418 FACD...ICG...RKFARS...DERKRHTKI...H ZFP60_MOUSE/484-506 FECK...ECG...KAFHFS...SQLNNHKTS...H ACE2_YEAST/633-657 YSCDF.PGCT...KAFVRN...HDLIRHKIS...H SUHW_DROAN/349-373 YACK...ICG...KDFTRS...YHLKRHQKYS.SC ZNF76_HUMAN/285-309 YTCPE.PHCG...RGFTSA...TNYKNHVRI...H TTKB_DROME/538-561 YPCP...FCF...KEFTRK...DNMTAHVKI..IH XFIN_XENLA/1044-1066 YKCG...LCE...RSFVEK...SALSRHQRV...H Q17793_CAEEL/209-234 YQCQ...LCK...KSISRHGQYANLLNHLSR...H TF3A_BUFAM/161-187 YPCRKDSTCP...FVGKTW...SDYMKHAAE..LH ZN592_HUMAN/1043-1069 YTCG...YCTEDSPSFPRP...SLLESHISL..MH * * * * 3

zf-c2h2 Family Diversity There are 42 structures of zf-c2h2 proteins in the Protein Data Bank 11 structures were applicable to our interests Of the 42 structures: 20 were from x-ray crystallography 22 were developed through NMR At least 15 were duplicate structures We only considered structures that were developed through x-ray crystallography and had either 2 or 3 zinc fingers, as they would belong to the EGR family Homology Modeling We chose two proteins with known structures to perform homology modeling, 1G2D and 1AAY Allows us to compare the predicted structure against the known structure to determine the accuracy of the prediction A Zif268 variant (1G2D) was selected for the target of the homology modeling, with the template being Zif268 (1AAY) The 1G2D recognizes the DNA Sequence: 5 GCTATAAAA 3 The sequences are 83% similar, with 81% sequence identity 1AAY MERPYACPVESCDRRFSRSDELTRHIRIHTGQKPFQCRICMRNFS MERPYACPVESCDRRFS+ L HIRIHTGQKPFQCRICMRNFS 1G2D MERPYACPVESCDRRFSQKTNLDTHIRIHTGQKPFQCRICMRNFS 1AAY RSDHLTTHIRTHTGEKPFACDICGRKFARSDERKRHTKIHLRQKD + L HIRTHTGEKPFACDICGRKFA R RHTKIHLRQKD 1G2D QHTGLNQHIRTHTGEKPFACDICGRKFATLHTRDRHTKIHLRQKD 4

Homology Modeling We were concerned with positions -1, 3 and 6 in the α-helix The Consensus Server, developed in part by Dr. Camacho, was used to perform the homology modeling (http://structure.bu.edu/cgibin/consensus/consensus.cgi) Since threading algorithms are used in the Consensus method, the side chains of amino acids can only be predicted to the extent of the corresponding amino acid from the template. Serine Lysine the method can only place Cα and Cβ atoms, leaving four carbon atoms positions undeterminable. CHARMM was used to complete the side chains Side Chain Relaxation via Molecular Dynamic Simulations We chose to relax the side chains for each domain independently to find the most favorable state Simulations were run using a constrained backbone to conserve the structure that was predicted in the previous step Run-time totaled of 4.2 ns for each domain 200 ps for system equilibration, Each time step was 2 fs This simulation did not take into account ions and without the DNA present We were particularly interested in the states of the three residues involved in DNA recognition 5

RMSD Analysis and Clustering RMSD analysis was performed between the results of the MD simulations and the crystal structure Cα atoms were aligned to produce a minimized RMSD calculation The RMSD was calculated for symmetric structures where applicable (i.e. arginine residues) to further minimize the RMSD A neighbor clustering algorithm was also applied to analyze the snapshots produced from the MD simulation Performed on a single side chain Calculated for all pairs of snapshots Clustering took place within a 1.0 Å threshold Clusters were ranked based upon the number of snapshots that were included. RMSD vs Time for Position 3 in Helix 1.9 1.7 1.5 RMSD (Å) 1.3 1.1 0.9 0.7 0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 Time (ns) 6

RMSD vs. Time for Residue 6 in Helix 3 RMSD (Å) 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 Time (ns) 7

Results Through RMSD and cluster analysis, we have determined that most of the residues reach an equilibrium point that is highly similar to the crystal structure. Cluster analysis revealed that the cluster with the most amount of neighbors is generally highly similar to the crystal structure. There are a few residues that are seen in the simulation that seem to fluctuate between two states, as can be seen in Figure 3. We believe that this fluctuation may be correlated to the mechanism by which the protein recognizes the DNA. 8

Other Models This method was also run in two other situations: Modeling Zif268 (1AAY) using the Zif268 variant (1G2D) as a template Modeling Designed Zinc Finger (1MEY) using Zif268 as the template Shares 49% identity with 1AAY Shares 47% identity with 1G2D Preliminary results and analysis show similar findings to 1G2D modeled after 1AAY Conclusions and Future Applications Through this method we are able to effectively determine a homology model of zinc finger proteins, more specifically zinc finger proteins in the EGR family. The modeled side chains are found to be in a state that is similar to the crystal structure, even when in an unbound state, which is particularly important for the key residues involved in DNA recognition. Since the modeled domains are in a desirable conformation, it is possible to perform docking experiments with homology modeled zinc fingers, which is currently being done using an DNA-protein docking algorithm developed in the lab. Future applications include modeling EGR proteins with an undetermined structure to see if the model is able to recognize the proper DNA sequence. 9

Acknowledgements Dr. Carlos J. Camacho, Advisor Christoph Champ BBSI Department of Computational Biology, University of Pittsburgh NIH NSF References J.C. Prasad, S.R. Comeau, S. Vajda, and C.J. Camacho. Consensus alignment for reliable framework prediction in homology modeling. Bioinformatics 2003 19: 1682-1691. Paillard G., Deremble C., Lavery R. Looking into DNA Recognition: Zinc Finger Binding Specificity. Nucleic Acids Research 2004 32: 6673-6682. A. Bateman, L. Coin, R. Durbin, R.D. Finn, V. Hollich, S. Griffiths-Jones, A. Khanna, M. Marshall, S. Moxon, E.L.L. Sonnhammer, D.J. Studholme, C. Yeats, S.R. Eddy. The Pfam Protein Families Database. Nucleic Acids Research: Database Issue 2004 32: D138- D141. 10