Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae Emily Germain 1,2 Mentor Dr. Hugh Nicholas 3 1 Bioengineering & Bioinformatics Summer Institute, Department of Computational Biology, University of Pittsburgh, Pittsburgh, PA 15261 2 Departments of Biomedical Engineering and Biology, Rensselaer Polytechnic Institute, Troy, NY 12180 3 Biomedical Initiative, Pittsburgh Supercomputing Center, Pittsburgh, PA 15213 Objectives Identify characteristics and motifs of protein family Determine residues essential to structure- function relationship Organize proteins into subfamilies Locate residues unique to particular subfamilies Make predictions regarding the protein s evolution 1
Heat Shock Proteins Present in all living cells in cytoplasm and nuclei Transcriptionally upregulated when cell is stressed Extremes of temperature Toxins Oxygen or nutrient deprivation Chaperone refolding of denatured proteins Transport other proteins within the cell Possible role in the immune response Model Heat Shock Protein Methanococcus jannaschii 2
Model Heat Shock Protein Multiple Sequence Alignment 190 Viridiplantae HSP sequences extracted from iproclass sequence database Remove fragments, 167 sequences remain Align sequences with T-Coffee Perform global multiple alignment for all sequences Run MEME to locate motifs 20 highly conserved patterns identified View T-Coffee and MEME results together and refine alignment by hand Remove sequences not displaying multiple motifs, 161 sequences in final alignment 3
Multiple Sequence Alignment Multiple Sequence Alignment 4
Residues Highly Conserved Over Family MEME Patterns in Group HSP 17 5
PHYLIP Bootstrap and Sequence Space Analyses Input refined MSA into algorithms SeqSpace calculates clusters, defines similarity vectors from origin PHYLIP iterations created 1000 trees, compiled to create consensus tree Combined output of PHYLIP and SeqSpace used to define five subfamilies Phylogenetic Tree Cord Moss HSP 16-20 Grasses HSP 17 HSP 22-23 6
Sequence Space Output Dimension 2 Dimension 3 Group Entropy Used PSC s GEnt program Calculates the group entropy distance for each defined subfamily Gives a best fit match for sequences still ungrouped Residues with higher scores are unique to a particular subfamily and essential to its specific function Group Entropy Distance = S [(p i -q ) x log i 2 (p i /q i )] p i foreground residue frequency q i background residue frequency 7
Group Entropy [Alignment Index][Predominate Subfamily Amino Acid]-[Predominate Family Amino Acid] High group entropy indicates conserved amino acid unique to subfamily Entropy for Group HSP 17 Group Entropy 8
Residues from HSP 17 with High Group Entropy Conclusions Evolutionary relationships suggest that different variations resulted from gene duplication HSPs are more closely related to others in species similar to the one in which they are found, rather than to others of comparable molecular weights in more distantly related species HSPs are highly conserved over the whole family, very specific residue alterations give particular subfamilies their individual properties The data collected in this study can be further analyzed by comparing the highly conserved residues found in each group. This can be matched up with data regarding the specific functions of each heat shock protein to generate hypothesis regarding how these specific residues contribute to functional specificity and biochemical properties. 9
Resources Bailey Timothy L., Elkan. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. 28-36. AAAI Press, Menlo Park, California. Cassari, G., Sander, C. and Valencia, A. 1995. A method to predict functional residues in proteins. Structural Biology. 2: 171-178. 178. Felsenstein J. 2004. PHYLIP: Phylogeny Inference Package. Department of Genome Sciences, University of Washington. http://evolution.genetics.washington.edu/phylip/doc/main.html Gong L., Puri M., et al. 2004. Drosophila ventral furrow morphogenesis: a proteomic analysis. Development. 131: 643-656. 656. Nicholas H.B. Jr., Ropelewski A., Deerfield D.W. II. 2000. Strategies tegies for Searching Sequence Databases. BioTechniques. 28: 1174-1191. 1191. Nicholas H.B. Jr., Ropelewski A., Deerfield D.W. II. 2002. Strategies tegies of Multiple Sequence Alignment. BioTechniques.. 32: 572-591. 591. Notredame, C., Higgins, D., Heringa, J. 2000. T-Coffee: A novel method for multiple sequence alignments. J. Mol. Bio. 302: 205-217. 217. Acknowledgements Dr. Hugh Nicholas Jr. Pittsburgh Supercomputing Center Rajan Munshi National Institutes of Health National Science Foundation Everyone in BBSI 10