Finding Motifs in Protein Sequences and Marking Their Positions in Protein Structures

Similar documents
08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Phylogenetic analyses. Kirsi Kostamo

Introducing Hippy: A visualization tool for understanding the α-helix pair interface

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Introduction to Bioinformatics Online Course: IBT

Supplementary Figure 1 Histogram of the marginal probabilities of the ancestral sequence reconstruction without gaps (insertions and deletions).

Journal of Proteomics & Bioinformatics - Open Access

Protein Bioinformatics Computer lab #1 Friday, April 11, 2008 Sean Prigge and Ingo Ruczinski

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

CELL CYCLE AND DIFFERENTIATION

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

BIOINFORMATICS LAB AP BIOLOGY

Computational Analysis of the Fungal and Metazoan Groups of Heat Shock Proteins

Bioinformatics. Dept. of Computational Biology & Bioinformatics

CSCE555 Bioinformatics. Protein Function Annotation

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Multiple Sequence Alignment. Sequences

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Session 5: Phylogenomics

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Visualization of Macromolecular Structures

Lecture 10: Cyclins, cyclin kinases and cell division

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

K. Structure Homology Modeling of

1 Abstract. 2 Introduction. 3 Requirements. 4 Procedure

Week 10: Homology Modelling (II) - HHpred

Comparing whole genomes

Initiation of DNA Replication Lecture 3! Linda Bloom! Office: ARB R3-165! phone: !

Algorithms in Bioinformatics

Cyclin-Dependent Kinase

Protein Modeling Methods. Knowledge. Protein Modeling Methods. Fold Recognition. Knowledge-based methods. Introduction to Bioinformatics

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Research Article HomoKinase: A Curated Database of Human Protein Kinases

Ch. 9 Multiple Sequence Alignment (MSA)

Arboretum Explorer: Using GIS to map the Arnold Arboretum

Goals. Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions

Modelling of Possible Binding Modes of Caffeic Acid Derivatives to JAK3 Kinase

FIG S1: Rarefaction analysis of observed richness within Drosophila. All calculations were

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Using Bioinformatics to Study Evolutionary Relationships Instructions

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Supplementary Figure 1. The workflow of the KinomeXplorer web interface.

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Open a Word document to record answers to any italicized questions. You will the final document to me at

RNA Protein Interaction

Objectives. Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain 1,2 Mentor Dr.

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Pymol Practial Guide

Last updated: Copyright

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018

Comparative genomics: Overview & Tools + MUMmer algorithm

Biological networks CS449 BIOINFORMATICS

PSODA: Better Tasting and Less Filling Than PAUP

CGS 5991 (2 Credits) Bioinformatics Tools

CS612 - Algorithms in Bioinformatics

Semantic Integration of Biological Entities in Phylogeny Visualization: Ontology Approach

Introduction to Evolutionary Concepts

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

HOMOLOGY MODELING AND MOLECULAR DYNAMICS OF CYCLIN-DEPENDENT PROTEIN KINASE

Sequence Alignment Techniques and Their Uses

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Measuring quaternary structure similarity using global versus local measures.

Genomics and bioinformatics summary. Finding genes -- computer searches

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Tree Building Activity

Quiz answers. Allele. BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 17: The Quiz (and back to Eukaryotic DNA)

Hands-On Nine The PAX6 Gene and Protein

Thursday, January 14. Teaching Point: SWBAT. assess their knowledge to prepare for the Evolution Summative Assessment. (TOMORROW) Agenda:

Browsing Genes and Genomes with Ensembl

Molecular Visualization. Introduction

# shared OGs (spa, spb) Size of the smallest genome. dist (spa, spb) = 1. Neighbor joining. OG1 OG2 OG3 OG4 sp sp sp

Multiple sequence alignment

Getting To Know Your Protein

Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain, Rensselaer Polytechnic Institute

Sequence Based Bioinformatics

Sequencing alignment Ameer Effat M. Elfarash

Tools and Algorithms in Bioinformatics

NIH Center for Macromolecular Modeling and Bioinformatics Developer of VMD and NAMD. Beckman Institute

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Protein Interaction Mapping: Use of Osprey to map Survival of Motor Neuron Protein interactions

NIH Center for Macromolecular Modeling and Bioinformatics Developer of VMD and NAMD. Beckman Institute

Design and implementation of a new meteorology geographic information system

A simple model for the eukaryotic cell cycle. Andrea Ciliberto

Plant Molecular and Cellular Biology Lecture 8: Mechanisms of Cell Cycle Control and DNA Synthesis Gary Peter

I. Short Answer Questions DO ALL QUESTIONS

Computational chemical biology to address non-traditional drug targets. John Karanicolas

Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space

Three-dimensional Cluster Analysis Identifies Interfaces and Functional Residue Clusters in Proteins

Francisco Melo, Damien Devos, Eric Depiereux and Ernest Feytmans

PNmerger: a Cytoscape plugin to merge biological pathways and protein interaction networks

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

Analysis and Prediction of Protein Structure (I)

Summary Table of Title III Project Course Redesign Modules

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week:

EVOLUTIONARY DISTANCES

BSC 4934: QʼBIC Capstone Workshop" Giri Narasimhan. ECS 254A; Phone: x3748

frmsdalign: Protein Sequence Alignment Using Predicted Local Structure Information for Pairs with Low Sequence Identity

Supplementary Methods

Transcription:

MotifMarker Demo This document contains a demo of MotifMarker used to study the structures of the CDK2,CDC2, CDK4 and CDK6 highlighting the 3 important domains/motifs, namely, the PSTAIRE region, ATP binding domain and kinase domain, in these cyclin-dependent kinases and try to relate the structural differences and similarities to their functions and cyclin partner specificity. Methods & Programs Used Finding Motifs in Protein Sequences and Marking Their Positions in Protein Structures A java program MotifMarker is designed and implemented to fish out motifs in sequences contained in fasta files and generate a report in HTML format and corresponding scripts for each results according to the template script provided. The program uses simple console environment and requires Java 2 Platform Standard Edition (J2SE) 5.0 to compile and run. The program first reads in the fasta file containing the sequences, and loads in a consensus files containing the motifs to find and a script templates containing the template to create scripts (These templates can be written in different languages so that scripts can be generated for running in Rasmol, Spb and adapting MotifMarker to different scriptable bioinformatics software only requires writing the corresponding script templates.) The source codes and compiled program is available online at http://albertwcheng.inscyber.net/motifmarker.htm More details of MotifMarker are given in Appendix 1. Obtaining Protein Structures Protein structures were searched for on the NCBI homepage and Protein Data Bank(PDB) for existing experimentally determined structures. However, most of the structures of the protein of interest were studied as complexes and most of these structures are for Homo sapiens. In order to easily obtain a structure of the protein from the complex and do corrections on the structure to predict the protein structure not in complex and also to predict protein structure of the cell cycle proteins of species that no experimentally determined structures have been created and whose sequences have high sequence similarity to that of the available structures, Swiss-model homology modeling protein prediction was applied. Fasta sequences are inputted into the Swiss-model homology Demo 1 of 8

modeling prediction server First Approach Mode online interface and results were emailed back and analyzed. Visualizing and manipulating protein structure 3D models The protein structure 3D models obtained from SWISS-MODEL server were analyzed and manipulated in DeepView/Swiss-PdbViewer. Superimposition of structures were performed by iterative magic fit function of the view, the sequence of the fits were determined by the phylogenetic trees and the distance matrixes of the proteins of interest, where the most similar proteins were fit together first, followed by less similar proteins, each fit into its own nearest neighbor until all structures were fitted. Tree building and Presentation The motifs marked by MotifMarker were analysised for phylogenetic relationship using PHYlogeny Inference Pakage (PHYLIP) and trees were visualized and presented in TreeView Demo 2 of 8

Results Table r1a PSTAIRE sequences of Homo sapiens CDKs CDC2 EGVPSTAIREISLLKE CDK2 EGVPSTAIREISLLKE CDK4 GGLPISTVREVALLRR CDK6 EGMPLSTIREVAVLRH * Consensus sequence found by MotifMarker using consensus PSTAIRE described in Table r1d Table r1b ATP binding sequences of Homo sapiens CDKs CDC2 GEGTYGVVYK CDK2 GEGTYGVVYK CDK4 GVGAYGTVYK CDK6 GEGAYGKVFK * Consensus sequence found by MotifMarker using consensus ATPBINDING described in Table r1d Table r2c Kinase sequences of Homo sapiens CDKs CDC2 HRDLKPQNLLI CDK2 HRDLKPQNLLI CDK4 HRDLKPENILV CDK6 HRDLKPQNILV * Consensus sequence found by MotifMarker using consensus KINASE described in Table r2d Table r1d Consensus for important domains in CDKs Consensus Consensus Regular Expression PSTAIRE XxxPS/I/LT/SA/TI/VRExxxxxx [A-Z]{3}P[SIL][TS][AT][IV]RE[A-Z]{6} ATPBINDING GE/VGA/TYGxVxK G[EV]G[AT]YG[A-Z]V[A-Z]K KINASE HRDLKPQ/ENL/ILV/I HRDLKP[QE]N[LI]L[VI] Demo 3 of 8

Figure r2 Cladogram of Homo sapiens CDK PSTAIRE sequences Figure r3 Phylogram of Homo sapiens Cyclin box sequences Demo 4 of 8

Figure r4 3D-structural model of Homo sapiens CDK2 highlighting the PSTAIRE region, ATP-binding domain and the kinase domain Demo 5 of 8

Figure r5 Superposition of Homo sapiens CDKs showing the three important domains Demo 6 of 8

Discussions Structure Superposition of CDKs revealed significant misfit in PSTAIRE region among other domains. CDKs with their PSTAIRE region, ATP binding, and kinase domains marked by MotifMarker were superposed and analyzed (fig r4,r5). In fig r5, only the backbone in the regions of interest was shown. Among the three regions, kinase domain showed highly fit backbone, ATP binding domain showed some degree of misfed, which may be responsible for the specificity of the activation of the kinase activity. The PSTAIRE region which is the interface for binding of the regulatory subunit Cyclin showed a significant misfit among the four kinases clustering into two major groups, CDK4/CDK6 in one group, CDK2/CDC2 in another. This observation was consistent in the fact that CDK2 and CDC2 are activated both by Cyclin A while CDK4 and CDK6 are both activated by Cyciln D. This structural clustering corresponding to the binding of the same cyclin and structural misfit corresponding to the binding of different cyclins suggested that the structure of the PSTAIRE regions confer specificity and selectivity of differential cyclin binding and subsequent activation of the kinase. Demo 7 of 8

References The RSBC Protein Data Bank (online) http://www.rcsb.org/pdb/ [Available: 12 May 2005] Martz E. (n.d.) Rasmol Homepage (online) http://www.umass.edu/microbio/rasmol/index2.htm [Available: 12 May 2005] Hanks S.K. et al. (1987) The Protein Kinase Family: Conserved features and deduced phylogeny of the catalytic domains. Science 241:42-52 Morgan D.O. (1996) The dynamics of cyclin dependent kinase structure. Current Opinion in Cell Biology 8:767-772 Garrett M.D. (2001) Cell cycle control and cancer. Current Science 81:515-522 Nugent J.H.A. et al. (1991) Conserved structural motifs in cyclins identified by sequence analysis. Journal of Cell Sciences 99:669-674 Guex N, and Schwede T. (n.d.) DeepView Scripting Language. (Online) http://tw.expasy.org/spdbv/text/script.htm [Available: 12 May 2005] Schwede T, Kopp J, Guex N, and Peitsch MC (2003) SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Research 31: 3381-3385. Guex, N. and Peitsch, M. C. (1997) SWISS-MODEL and the Swiss-PdbViewer: An environment for comparative protein modelling. Electrophoresis 18: 2714-2723. Peitsch, M. C. (1995) Protein modeling by E-mail Bio/Technology 13: 658-660. Demo 8 of 8