Objectives. Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain 1,2 Mentor Dr.

Similar documents
Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain, Rensselaer Polytechnic Institute

Computational Analysis of the Fungal and Metazoan Groups of Heat Shock Proteins

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Sequence Based Bioinformatics

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

MEME - Motif discovery tool REFERENCE TRAINING SET COMMAND LINE SUMMARY

Some Problems from Enzyme Families

Effects of Gap Open and Gap Extension Penalties

Introduction to Bioinformatics Online Course: IBT

Multiple sequence alignment

Matrix-based pattern discovery algorithms

Sequence Alignment Techniques and Their Uses

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Engineering of Repressilator

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Tools and Algorithms in Bioinformatics

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Copyright 2000 N. AYDIN. All rights reserved. 1

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Absolute Entropy of a 2D Lattice Model for a Denatured Protein

Computational methods for predicting protein-protein interactions

Goals. Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

Algorithms in Bioinformatics

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

BIOINFORMATICS: An Introduction

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Topics in Computational Biology and Genomics

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Phylogenetic analysis of Cytochrome P450 Structures. Gowri Shankar, University of Sydney, Australia.

Subfamily HMMS in Functional Genomics. D. Brown, N. Krishnamurthy, J.M. Dale, W. Christopher, and K. Sjölander

Building a Homology Model of the Transmembrane Domain of the Human Glycine α-1 Receptor

Bioinformatics Chapter 1. Introduction

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

7. Tests for selection

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

CSCE555 Bioinformatics. Protein Function Annotation

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Ch. 9 Multiple Sequence Alignment (MSA)

Gürol M. Süel, Steve W. Lockless, Mark A. Wall, and Rama Ra

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Comparative Protein Modeling of Superoxide Dismutase Isoforms in Maize.

Quantifying sequence similarity

Cladistics and Bioinformatics Questions 2013

Large Grain Size Stochastic Optimization Alignment

Introduction to Evolutionary Concepts

Unsupervised Learning in Spectral Genome Analysis

De novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

Session 5: Phylogenomics

BLAST. Varieties of BLAST

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Alignment. Peak Detection

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Introduction to Bioinformatics Introduction to Bioinformatics

Analysis of N-terminal Acetylation data with Kernel-Based Clustering

doi: / _25

The PRALINE online server: optimising progressive multiple alignment on the web

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

An Introduction to Sequence Similarity ( Homology ) Searching

Phylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign

Multiple Sequence Alignment. Sequences

Neyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?

A profile-based protein sequence alignment algorithm for a domain clustering database

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Discovering Binding Motif Pairs from Interacting Protein Groups

Comparative Genomics II

Multiple Sequence Alignment: HMMs and Other Approaches

Monte Carlo Simulations of Protein Folding using Lattice Models

Graph Alignment and Biological Networks

Phylogenetic Tree Reconstruction

A Phylogenetic Gibbs Recursive Sampler for Locating Transcription Factor Binding Sites

Phylogenetic inference

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

COPIA: A New Software for Finding Consensus Patterns. Chengzhi Liang. A thesis. presented to the University ofwaterloo. in fulfilment of the

Multiple Sequence Alignment: A Critical Comparison of Four Popular Programs

Dr. Amira A. AL-Hosary

Phylogenetic analyses. Kirsi Kostamo

1 Abstract. 2 Introduction. 3 Requirements. 4 Procedure

Lecture 8 Multiple Alignment and Phylogeny

Multiple Sequence Alignment

Sequence Database Search Techniques I: Blast and PatternHunter tools

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

Transcription:

Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae Emily Germain 1,2 Mentor Dr. Hugh Nicholas 3 1 Bioengineering & Bioinformatics Summer Institute, Department of Computational Biology, University of Pittsburgh, Pittsburgh, PA 15261 2 Departments of Biomedical Engineering and Biology, Rensselaer Polytechnic Institute, Troy, NY 12180 3 Biomedical Initiative, Pittsburgh Supercomputing Center, Pittsburgh, PA 15213 Objectives Identify characteristics and motifs of protein family Determine residues essential to structure- function relationship Organize proteins into subfamilies Locate residues unique to particular subfamilies Make predictions regarding the protein s evolution 1

Heat Shock Proteins Present in all living cells in cytoplasm and nuclei Transcriptionally upregulated when cell is stressed Extremes of temperature Toxins Oxygen or nutrient deprivation Chaperone refolding of denatured proteins Transport other proteins within the cell Possible role in the immune response Model Heat Shock Protein Methanococcus jannaschii 2

Model Heat Shock Protein Multiple Sequence Alignment 190 Viridiplantae HSP sequences extracted from iproclass sequence database Remove fragments, 167 sequences remain Align sequences with T-Coffee Perform global multiple alignment for all sequences Run MEME to locate motifs 20 highly conserved patterns identified View T-Coffee and MEME results together and refine alignment by hand Remove sequences not displaying multiple motifs, 161 sequences in final alignment 3

Multiple Sequence Alignment Multiple Sequence Alignment 4

Residues Highly Conserved Over Family MEME Patterns in Group HSP 17 5

PHYLIP Bootstrap and Sequence Space Analyses Input refined MSA into algorithms SeqSpace calculates clusters, defines similarity vectors from origin PHYLIP iterations created 1000 trees, compiled to create consensus tree Combined output of PHYLIP and SeqSpace used to define five subfamilies Phylogenetic Tree Cord Moss HSP 16-20 Grasses HSP 17 HSP 22-23 6

Sequence Space Output Dimension 2 Dimension 3 Group Entropy Used PSC s GEnt program Calculates the group entropy distance for each defined subfamily Gives a best fit match for sequences still ungrouped Residues with higher scores are unique to a particular subfamily and essential to its specific function Group Entropy Distance = S [(p i -q ) x log i 2 (p i /q i )] p i foreground residue frequency q i background residue frequency 7

Group Entropy [Alignment Index][Predominate Subfamily Amino Acid]-[Predominate Family Amino Acid] High group entropy indicates conserved amino acid unique to subfamily Entropy for Group HSP 17 Group Entropy 8

Residues from HSP 17 with High Group Entropy Conclusions Evolutionary relationships suggest that different variations resulted from gene duplication HSPs are more closely related to others in species similar to the one in which they are found, rather than to others of comparable molecular weights in more distantly related species HSPs are highly conserved over the whole family, very specific residue alterations give particular subfamilies their individual properties The data collected in this study can be further analyzed by comparing the highly conserved residues found in each group. This can be matched up with data regarding the specific functions of each heat shock protein to generate hypothesis regarding how these specific residues contribute to functional specificity and biochemical properties. 9

Resources Bailey Timothy L., Elkan. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. 28-36. AAAI Press, Menlo Park, California. Cassari, G., Sander, C. and Valencia, A. 1995. A method to predict functional residues in proteins. Structural Biology. 2: 171-178. 178. Felsenstein J. 2004. PHYLIP: Phylogeny Inference Package. Department of Genome Sciences, University of Washington. http://evolution.genetics.washington.edu/phylip/doc/main.html Gong L., Puri M., et al. 2004. Drosophila ventral furrow morphogenesis: a proteomic analysis. Development. 131: 643-656. 656. Nicholas H.B. Jr., Ropelewski A., Deerfield D.W. II. 2000. Strategies tegies for Searching Sequence Databases. BioTechniques. 28: 1174-1191. 1191. Nicholas H.B. Jr., Ropelewski A., Deerfield D.W. II. 2002. Strategies tegies of Multiple Sequence Alignment. BioTechniques.. 32: 572-591. 591. Notredame, C., Higgins, D., Heringa, J. 2000. T-Coffee: A novel method for multiple sequence alignments. J. Mol. Bio. 302: 205-217. 217. Acknowledgements Dr. Hugh Nicholas Jr. Pittsburgh Supercomputing Center Rajan Munshi National Institutes of Health National Science Foundation Everyone in BBSI 10