Computational Analysis of the Fungal and Metazoan Groups of Heat Shock Proteins

Similar documents
Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain, Rensselaer Polytechnic Institute

Objectives. Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain 1,2 Mentor Dr.

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Bioinformatics. Dept. of Computational Biology & Bioinformatics

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Introduction to Bioinformatics Online Course: IBT

Some Problems from Enzyme Families

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

Mutation Selection on the Metabolic Pathway and the Effects on Protein Co-evolution and the Rate Limiting Steps on the Tree of Life

Sequence Based Bioinformatics

Introducing Hippy: A visualization tool for understanding the α-helix pair interface

BIOINFORMATICS: An Introduction

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

Computational methods for predicting protein-protein interactions

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Multiple Sequence Alignment

Phylogenetic analysis of Cytochrome P450 Structures. Gowri Shankar, University of Sydney, Australia.

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

86 Part 4 SUMMARY INTRODUCTION

BIOINFORMATICS: METHODS AND APPLICATIONS: (Genomics, Proteomics and Drug Discovery)

Analysis and Prediction of Protein Structure (I)

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Effects of Gap Open and Gap Extension Penalties

Session 5: Phylogenomics

BIOLOGY 111. CHAPTER 1: An Introduction to the Science of Life

Finding Motifs in Protein Sequences and Marking Their Positions in Protein Structures

The Contribution of Bioinformatics to Evolutionary Thought

Computational Structural Bioinformatics

Phylogenetic analyses. Kirsi Kostamo

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Homology Modeling. Roberto Lins EPFL - summer semester 2005

1 Abstract. 2 Introduction. 3 Requirements. 4 Procedure

Introduction Biology before Systems Biology: Reductionism Reduce the study from the whole organism to inner most details like protein or the DNA.

Sequence Alignment Techniques and Their Uses

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Protein Science (1997), 6: Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society

Campbell Biology AP Edition 11 th Edition, 2018

The Phylogenetic Handbook

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology

arxiv: v1 [q-bio.to] 16 Jul 2012

Map of AP-Aligned Bio-Rad Kits with Learning Objectives

CSCE555 Bioinformatics. Protein Function Annotation

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Single alignment: Substitution Matrix. 16 march 2017

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Berg Tymoczko Stryer Biochemistry Sixth Edition Chapter 1:

MEME - Motif discovery tool REFERENCE TRAINING SET COMMAND LINE SUMMARY

Enduring understanding 1.A: Change in the genetic makeup of a population over time is evolution.

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Supplementary information. A proposal for a novel impact factor as an alternative to the JCR impact factor

Comparative genomics: Overview & Tools + MUMmer algorithm

Algorithms in Bioinformatics

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Mechanisms of Evolution Darwinian Evolution

Goal 1: Develop knowledge and understanding of core content in biology

Big Idea 1: Does the process of evolution drive the diversity and unit of life?

Causal Discovery by Computer

8/23/2014. Phylogeny and the Tree of Life

Subfamily HMMS in Functional Genomics. D. Brown, N. Krishnamurthy, J.M. Dale, W. Christopher, and K. Sjölander

Introduction to Evolutionary Concepts

Chapter 15: Darwin and Evolution

Measuring quaternary structure similarity using global versus local measures.

A A A A B B1

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

CHEM 121: Chemical Biology

Homology and Information Gathering and Domain Annotation for Proteins

Chapter 16: Reconstructing and Using Phylogenies

Proteomics. 2 nd semester, Department of Biotechnology and Bioinformatics Laboratory of Nano-Biotechnology and Artificial Bioengineering

Big Idea 1: The process of evolution drives the diversity and unity of life.

Case study: spider mimicry

Miller & Levine Biology 2014

AP Curriculum Framework with Learning Objectives

Supplementary Materials for

Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

Francisco Melo, Damien Devos, Eric Depiereux and Ernest Feytmans

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Chapter 19. Microbial Taxonomy

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Predictive Modeling of Signaling Crosstalk... Model execution and model checking can be used to test a biological hypothesis

Protein Structure Prediction and Display

Week 10: Homology Modelling (II) - HHpred

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Unit of Study: Genetics, Evolution and Classification

Cladistics and Bioinformatics Questions 2013

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Phylogeny & Systematics

Emily Blanton Phylogeny Lab Report May 2009

7. Tests for selection

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Supplemental Data. Perea-Resa et al. Plant Cell. (2012) /tpc

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

5/4/05 Biol 473 lecture

ADVANCED PLACEMENT BIOLOGY

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species

Transcription:

Computational Analysis of the Fungal and Metazoan Groups of Heat Shock Proteins Introduction: Benjamin Cooper, The Pennsylvania State University Advisor: Dr. Hugh Nicolas, Biomedical Initiative, Carnegie Mellon University Biological applications of computers are vastly increasing as computer technology rapidly improves. One such application is the visualization of the three-dimensional structure of certain proteins. There is a direct correlation between the amino acid structure of proteins and the functionally important structure of these proteins. Through evolution and mutation the amino acid sequences of a number of proteins has changed in many ways, yet the overall function of these proteins has remain unadulterated. However, even through the millions of years and the plethora of mutations certain regions of the amino acid sequences have remained identical throughout a certain protein family. It is these conserved regions which we hope to discover with the anticipation that these regions might expose the critical structural regions of the amino acid sequence of selected proteins. In the search for a protein group to analyze the Drosophila was of particular interest due to the availability of studies on the organism. One of the key events in the development of a Drosophila embryo is the formation of the ventral furrow. There are vast swings in gene expression levels as well as protein concentrations during the invagination of the Drosophila embryo. One such protein is heat shock protein 23. The concentration of heat shock protein

23 increases over 100 fold in the 15 minutes after gastrulation commences 1. This has sparked interest in the protein and the family to which it belongs. HSP23 is a member of the alpha-crystallin-related small heat shock protein family. The function of these proteins is vast and varied. One function of interest is that these proteins act as chaperones which are critical in the stress response of numerous cells. They protect against oxidative stresses as well as possessing anti-apoptotic properties. In recent studies, a link has been established between members of this protein family and some neurological disorders 2 demonstrating the important biological role of this protein family. One key area of information associated with any protein is its threedimensional configuration. With this information, drugs can be designed to block the active site of any protein or modulate the activity of the protein through allosteric interactions.. One method to achieve this goal is to crystallize every individual protein in the family. Unfortunately, this method is both tedious and sometimes extremely challenging. Thus, other more efficient approaches are preferred to investigate the 3D structure of proteins. This approach is one of that takes place entirely in silico. In the IProclass database 3, there are 122 proteins from the alphacrystallin-related small heat shock protein family belonging to organisms within the Metazoan and Fungal classification. The goal of my research project is to compare these protein sequences with different software packages and analyze the similarities and differences between the sequences. In this manner, I will identify regions of the amino acid sequences essential to the function of the

proteins and determine the non-conserved sequences/regions that result in diversity of the protein family. Methods: The method for the sequence manipulation is essentially a seven step process. Following the sequence analysis visual manipulation occurs. The first step in the sequence manipulation will be the retrieval of the amino acid sequences from the IProclass 3 database. The 122 sequences will then be compiled into a text file prior to analyses. The first of these was a sequence alignment by a program called T-Coffee 4. This program performs an approximate multiple sequence alignment. The next step of processing was through an algorithm entitled MEME 5. MEME is a more specific algorithm that searches throughout the entire sequence and does a pair wise analysis to determine if there are any repetitions throughout all of the sequences. The combination of the two program outputs will be the input format for Genedoc 6. This program serves as an editing platform that parses through and determines the regions that are highly conserved and, therefore, critical to the viability and functionality of the protein. Also when the conserved regions of the amino acid sequence are isolated the non-conserved regions become evident. Fourth, I will use programs from the PHYLIP 7 suite. These programs will perform a bootstrap analysis on the amino acid sequences to try and separate the 122 sequences down into even smaller subfamilies. Next, I will analyze the subfamilies with SeqSpace 8 to confirm the subfamilies as well as determine which residues are the most influential to the characteristics of the subfamily.

Finally, a phylogenetic tree will be constructed with the above information to visualize these relationships. Once the sequences are aligned as closely as possible, a threedimensional model of the protein can be constructed and color-coded to demonstrate which regions of the protein are conserved between organisms in the family using the visualization program RasMol 9. These color-coded areas are uniquely important to the function of the protein and can be the main consideration of future experiments and analyses. Also, with knowledge of these regions, models can be extrapolated from models that are already in the existing Protein Data Bank 10. Possible Results and Implications: After the first set of algorithms is executed on the sequences, it should be possible to categorize the proteins/sequences into different groups. These groups should provide varied degrees of insight. One possible outcome would be to determine evolutionary pathways between organisms in the same family. Another outcome would be to propose possible functions of the different proteins of the family. This is especially useful in cases where a function has not been determined for a member of the family. With a functional three-dimensional model of the protein family, a number of hypotheses could be suggested. Testable hypothesis are an invaluable resource to the scientific community. This model would cater itself well to computational biologists for testing in a variety of fields from neuroscience to

oncology. With the above implications, it can be seen that this research is worthy of the time and resources allocated. References 1) Gong, Lei and Puri, Mamta. 2004. Drosophila ventral furrow morphogenesis: a proteomic analysis. The Company of Biologists. 2) Perng, M. D. and Quinlan, R. A. 2004. Neuroscience: On Small Heat Shock Proteins. Current Biology 14:R625. 3) Wu C, Huang H, Nikolskaya A, Hu Z, Yeh LS, Barker WC. 2004. The iproclass Integrated database for protein functional analysis. Computational Biology and Chemistry, 28:87-96. 4) Notredame C., Higgins D., Heringa J. 2000. Coffee: A novel method for multiple sequence alignments..journal of Molecular Biology. 302: 205-217. 5) Bailey Timothy L., Elkan. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. 28-36. 6) Nicholas, K.B., Nicholas H.B. Jr., and Deerfield, D.W. II. 1997 GeneDoc: Analysis and Visualization of Genetic Variation, EMBNEW.NEWS 4:14 7) Felsenstein, J. 2004. PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle. 8) Cassari, G., Sander, C. and Valencia, A. A method to predict functional residues in proteins. Structural Biology. 1995; 2:171-178. 9) Roger Sayle and E. James Milner-White. "RasMol: Biomolecular graphics for all", Trends in Biochemical Sciences (TIBS), September 1995, Vol. 20, No. 9, p. 374. 10) H.M.Berman, J.Westbrook, Z.Feng, G.Gilliland, T.N.Bhat, H.Weissig, I.N.Shindyalov, P.E.Bourne. The Protein Data Bank. Nucleic Acids Research, 28 pp. 235-242 (2000) http://www.pbd.org/