Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain, Rensselaer Polytechnic Institute

Similar documents
Objectives. Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain 1,2 Mentor Dr.

Computational Analysis of the Fungal and Metazoan Groups of Heat Shock Proteins

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Sequence Based Bioinformatics

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

BIRKBECK COLLEGE (University of London)

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Protein Science (1997), 6: Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society

MEME - Motif discovery tool REFERENCE TRAINING SET COMMAND LINE SUMMARY

Introduction to Bioinformatics Online Course: IBT

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology

Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Mutation Selection on the Metabolic Pathway and the Effects on Protein Co-evolution and the Rate Limiting Steps on the Tree of Life

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Exploring Evolution & Bioinformatics

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Biodiversity. The Road to the Six Kingdoms of Life

18.4 Embryonic development involves cell division, cell differentiation, and morphogenesis

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Automated Assignment of Backbone NMR Data using Artificial Intelligence

Effects of Gap Open and Gap Extension Penalties

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Accelerating Biomolecular Nuclear Magnetic Resonance Assignment with A*

Sequence Alignment Techniques and Their Uses

Text of objective. Investigate and describe the structure and functions of cells including: Cell organelles

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Biodiversity. The Road to the Six Kingdoms of Life

Map of AP-Aligned Bio-Rad Kits with Learning Objectives

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Fundamentals of Biology Valencia College BSC1010C

BIOINFORMATICS: An Introduction

Name: Class: Date: ID: A

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and

Chapter 12: Intracellular sorting

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

Sequence analysis and comparison

Algorithms in Bioinformatics

Piecing It Together. 1) The envelope contains puzzle pieces for 5 vertebrate embryos in 3 different stages of

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Computational Biology: Basics & Interesting Problems

Miller & Levine Biology 2014

Phylogenetic analyses. Kirsi Kostamo

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

12U Biochemistry Unit Test

Dr. Amira A. AL-Hosary

Chapter 11. Development: Differentiation and Determination

Phylogenetic analysis of Cytochrome P450 Structures. Gowri Shankar, University of Sydney, Australia.

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

Some Problems from Enzyme Families

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

Outline. Evolution: Speciation and More Evidence. Key Concepts: Evolution is a FACT. 1. Key concepts 2. Speciation 3. More evidence 4.

7. Tests for selection

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Characteristics of Life

8/23/2014. Phylogeny and the Tree of Life

Heteropolymer. Mostly in regular secondary structure

CSCE555 Bioinformatics. Protein Function Annotation

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.

A A A A B B1

Introduction to Evolutionary Concepts

MBLG lecture 5. The EGG! Visualising Molecules. Dr. Dale Hancock Lab 715

THE UNIVERSITY OF MANITOBA. PAPER NO: 409 LOCATION: Fr. Kennedy Gold Gym PAGE NO: 1 of 6 DEPARTMENT & COURSE NO: CHEM 4630 TIME: 3 HOURS

Chapter 18 Lecture. Concepts of Genetics. Tenth Edition. Developmental Genetics

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Translation Part 2 of Protein Synthesis

Chemistry in Biology Section 1 Atoms, Elements, and Compounds

The Phylogenetic Handbook

Biol478/ August

Background: Comment [1]: Comment [2]: Comment [3]: Comment [4]: mass spectrometry

Quantifying sequence similarity

PHYLOGENY AND SYSTEMATICS

BIBC 100. Structural Biochemistry

Alignment. Peak Detection

Evidences of Evolution (Clues)

Genomes and Their Evolution

NIH Center for Macromolecular Modeling and Bioinformatics Developer of VMD and NAMD. Beckman Institute

2.1 The Nature of Matter

Bioinformatics Chapter 1. Introduction

Evaluate evidence provided by data from many scientific disciplines to support biological evolution. [LO 1.9, SP 5.3]

Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes

Multiple sequence alignment

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Bioinformatics Exercises

Transcription:

Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae Emily Germain, Rensselaer Polytechnic Institute Mentor: Dr. Hugh Nicholas, Biomedical Initiative, Pittsburgh Supercomputing Center Introduction A central concept of biochemistry is that the amino acid sequence of a protein is directly responsible for its three dimensional structure. This specific structure determines the biochemical activity and function. Proteins with related structures or functions are considered to be part of the same protein family. Over time, mutations introduced into the genome have led to proteins with similar activity and structure but differing sequences. As species diverge the differences become more prominent but retain certain sequence elements that are critical to the proper function of the protein. These changes can be mapped to produce an evolutionary tree that provides clues about how species are related and how long ago they diverged. We will be using the information obtained to learn which regions of the proteins are the most highly conserved and to make hypotheses about which sequence elements are essential to the overall structure or function. Heat shock proteins are found in every living cell and have a broad range of functions. HSPs are part of the cell response to stresses such as extremes of temperature, deprivation of oxygen or glucose, or exposure to toxins. They normally make up about two percent of a cell s soluble protein, but in a stressed cell they can account for twenty percent. They can be found in both the cytoplasm and the nucleus depending on the specific type and condition of a cell. HSPs help proteins denatured by stresses to refold back into their proper shape, and most can also chaperone the folding of newly made

proteins. They transport other proteins between compartments within the cell and possibly function in the immune response by presenting abnormal peptides to molecules that move them to the cell surface. The presence of HSPs outside the cell is also a strong signal to the immune system that necrosis is taking place. The full range of functions of HSPs is unknown, but evidence shows that it is also important in the embryonic stages of Drosophila. The protein that will be focused on in these analyses is heat shock protein 23, a member of the alpha-crystallin-related small heat shock protein family. HSP 23 is present in small concentrations in a Drosophila embryo before gastrulation, but is 100 times more abundant in the first fifteen minutes after the start of the ventral furrow formation (Gong et al, 2004). The reasons for this are still unknown, making this an interesting protein for study and analysis. Methods About 190 sequences for heat shock proteins of organisms of the Kingdom Viridiplantae will be extracted from the data available in the IProClass database. These sequences will be aligned using two different programs, T-Coffee (Notredame, 2000) and MEME (Bailey et al, 1994). T-Coffee creates a global multiple sequence alignment, which will attempt to line up all the similar regions of data and compare across the entire set. The alignment is quantified in a relationship tree which will be used to group the most closely related sequences to aid visualization. Patterns will be identified, mapped, and organized to visually display the variations that exist between the different protein sequences. The alignment displays regions that are highly conserved and therefore likely to be critical to the protein structure or function. The regions with higher degrees of variation are those that are less essential to the protein and are tolerant of mutations.

MEME will be run using the Zero or One Per Sequence method to identify twenty motifs. The MEME program scans the sequences to look for patterns regardless of their placement along the protein. The patterns identified are position independent conserved sequence elements that aid in judging the accuracy of the results of T-Coffee. These patterns are used to manually refine the results found during the global alignment. These programs are the standard for their respective processes because they are common and well established, and are known for their good performance. Using programs from the PHYLIP suite (Felsenstein, 2004), a bootstrap analysis will be performed to separate the proteins with similar biochemical activities into distinct subfamilies and quantify how closely related members of a subfamily are. A SeqSpace analysis (Cassari et al, 1995) will calculate which columns of the alignment have the most and least similar sequence variations to confirm groupings and identify which residues contribute to the characteristics of a particular subfamily. A phylogenetic tree will be constructed to visualize these relationships. A cross-entropy analysis will be calculated using the GEnt program to identify which residues are unique to a particular subset of the family and contribute to its specific properties. After the highly conserved sequences have been identified, three dimensional graphical models will be constructed using RasMol and a general representative for each of the protein subfamilies. Important features and residues that define the subfamily will be highlighted and the models will be used to form hypotheses about which areas make up the active site or bind to other molecules, which parts are critical to maintaining a functional structure, and how the protein performs its functions.

Expected Results and Interpretation The map of aligned sequences produced from this research will aid in identifying possible roles of conserved and critical residues. It will identify which regions must be maintained in order for the protein to perform the functions that define the heat shock family of proteins. Distinct groups of more closely related sequences will suggest which proteins were at one time duplicates of others in the same organisms. Regions highly conserved in one group and less so in others will provide clues to the function of those regions when compared to differences in activity experimentally observed for the proteins in those groups. A group of sequences that have a specific sequence element conserved that is not present in others may yield information on which residues bind a substrate specific to proteins from that subgroup. The three dimensional model constructed of a heat shock protein and labeled with the highly conserved regions can give insight into how the protein works and where it binds substrates. It can be used to model the effects of sequence mutations and how they lead to the development of new or more refined functions. The results will be an accumulation of hypotheses about which residues are important features of the molecule and can be used as a starting point for experimental verification. The set of sequences from the Kingdom Viridiplantae to be analyzed will be joined with similar data collected from heat shock proteins of animals and fungi. The larger amount of collective information will be used to gain further insights into the differences in hsp 23 across a wider range of organisms. The information collected will lead to a better understanding of heat shock proteins and the role they play in the cell function.

References Bailey Timothy L., Elkan. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. (1994): 28-36. AAAI Press, Menlo Park, California. Cassari, G., Sander, C. and Valencia, A. A method to predict functional residues in proteins. Structural Biology. 2 (1995): 171-178. Felsenstein J. PHYLIP: Phylogeny Inference Package. Department of Genome Sciences, University of Washington. 2004. http://evolution.genetics.washington.edu/phylip/doc/main.html Gong, Mamta, et al. Drosophila ventral furrow morphogenesis: a proteomic analysis. Development. 131 (2004): 643-656. Notredame, C., Higgins, D., Heringa, J. T-Coffee: A novel method for multiple sequence alignments. J. Mol. Bio. 302 (2000): 205-217.