PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification

Size: px
Start display at page:

Download "PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification"

Transcription

1 Nucleic Acids Research, 2003, Vol. 31, No. 1 # 2003 Oxford University Press DOI: /nar/gkg115 PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification Paul D.Thomas*, Anish Kejariwal, Michael J.Campbell, Huaiyu Mi, Karen Diemer, Nan Guo, Istvan Ladunga, Betty Ulitsky-Lazareva, Anushya Muruganujan, Steven Rabkin, Jody A.Vandergriff and Olivier Doremieux Protein Informatics, Celera Genomics, 850 Lincoln Center Drive, Foster City, CA 94404, USA ReceivedAugust 30, 2002; RevisedandAcceptedOctober 27, 2002 ABSTRACT The PANTHER database was designed for highthroughput analysis of protein sequences.one of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions.biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups.the advantage of this approach is that new sequences can be automatically classified as they become available.to ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies.multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family.the current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster.panther is publicly available on the web at INTRODUCTION The PANTHER database was designed for high-throughput functional analysis of large sets of protein sequences (1). It has been used to annotate the human genome (2) as well as the Drosophila genome (3). Like databases such as Pfam (4) and SMART (5), PANTHER uses a library of Hidden Markov Models (HMMs) to annotate sequences with information from homologous sequences. However, unlike these databases, the goal of PANTHER is not to annotate individual domains, but the overall biological function(s) of the molecule. Also unlike these other databases, because many protein families have branches that have diverged in function during evolution, the PANTHER library contains HMMs not only for families, but also for functionally distinct subfamilies. In these cases, subfamily annotation allows a much more precise definition of nomenclature and biological function. PANTHER is composed of two main components: the PANTHER library (PANTHER/LIB) and the PANTHER index (PANTHER/X). PANTHER/LIB is a collection of books, each representing a protein family as a multiple sequence alignment, an HMM and a family tree. Functional divergence within the family is represented by first dividing the tree into subtrees (subfamilies) based on shared function, and then constructing a distinct HMM for each subfamily. PANTHER/X is an abbreviated ontology for summarizing and navigating molecular (biochemical) functions and biological processes (such as pathways, cellular roles or even physiological functions). Families and subfamilies are defined and named by biologist curators, who then associate each group of sequences with terms in the PANTHER/X ontology. Protein query sequences can then be scored against the functionally-labelled family and subfamily HMMs. Query sequences are classified with the name and functional assignments of the best-scoring HMM, with the HMM score providing an estimate of the confidence level of the classification. Like other HMM-based approaches, PANTHER classification scales well for genome projects: the curated functional assignment is performed up-front on sets of training sequences that span many organisms, and can then be transferred to other organisms using the labelled HMMs. As a result, the PANTHER database classifies a significantly larger fraction of human genes than does LocusLink (Table 1). PANTHER has been available to Celera Discovery System (CDS) (7) subscribers for almost two years, and is now publicly available to academic users at com. The public version uses the GenBank non-redundant protein database to define sets of training sequences for HMMs. These HMMs are used to classify human gene products from LocusLink, and Drosophila melanogaster gene products from FlyBase ( *To whom correspondence should be addressed. paul.thomas@fc.celera.com

2 Nucleic Acids Research, 2003, Vol. 31, No

3 336 Nucleic Acids Research, 2003, Vol. 31, No. 1 Figure 1. (previous page and above) Browsing the PANTHER database by biological functions. (A) Selection of biological processes under lipid, fatty acid and steroid metabolism (note that categories can be independently selected/deselected, so, for example, steroid metabolism has been deselected). (B) Retrieval of protein families and subfamilies assigned by curators to the selected functional categories. (C) Retrieval of a list of human genes encoding proteins that match the selected family and subfamily HMMs. release3download.shtml). The CDS version includes training proteins from the sets curated at Celera, with additional HMM scoring of Celera-curated human and mouse gene products. BROWSING GENES BY FUNCTION A key feature of PANTHER is that it can be browsed by protein functions, facilitating access to biologists. Browsing of controlled vocabulary terms can be much simpler than trying to construct effective queries in databases that have free text annotations. The primary entry point into PANTHER is the PANTHER Prowler, which uses the file-folder analogy to navigate PANTHER/X molecular functions and biological processes (Fig. 1). The PANTHER/X ontology is essentially hierarchical, though, more accurately, it is a directed acyclic graph as child categories occasionally appear under more than one parent if it is biologically justified. For example, the biological process DNA replication is a child of two categories: (1) nucleoside, nucleotide and nucleic acid metabolism, and (2) cell cycle. PANTHER/X contains many of the same higherlevel categories as the more comprehensive Gene Ontology (GO) (8), and has been mapped to GO (3), but is arranged quite differently in order to facilitate navigation and large-scale analysis of protein sets. PANTHER/X also contains a number of vertebrate-specific categories that do not appear in the current release of GO, such as additional developmental and immune system categories. After a set of functions is selected, the Prowler retrieves the list of protein families and/or subfamilies that have been previously assigned, by biologist curators, to those functions. Table 1. The percentage of human genes (approximated by LocusLink entries) having functional ontology classifications from PANTHER and from Locus- Link GO associations LocusLink GO Molecular function (NP) 42% 52% Molecular function (XP) 0% 19% Biological process (NP) 41% 46% Biological process (XP) 0% 17% PANTHER/X Percentages of genes classified are shown for two sets of LocusLink entries: NP (with a curated RefSeq protein, accession beginning with NP, total: ), and XP (with only a provisional RefSeq entry, accession beginning with XP, total: ). The total number of LocusLink entries that hit a PANTHER HMM is 9276 (67%) for NP, and 9141 (24%) for XP.

4 Nucleic Acids Research, 2003, Vol. 31, No Figure 2. The PANTHER multiple sequence alignment view, highlighting globally conserved positions (black and gray), and subfamily-specific conservation patterns that may indicate residues important for functional specificity (red). Pfam domains are shown as blue bars, one for each subfamily. A user can make further selections in the family/subfamily list, and then generate a list of proteins or genes that scored significantly against the HMMs for the selected families and subfamilies. In the current version, gene lists are available for LocusLink human genes, and FlyBase Drosophila genes. The LocusLink and FlyBase sequences used to create these gene lists are updated on a monthly basis. Gene lists can be sorted and easily exported in tab-delimited format. In addition to browsing, PANTHER can be accessed by text searching of curator-assigned family and subfamily names, or of the GenBank identifiers or definition lines of training sequences. Training sequences for the classification can also be searched by BLASTP (9). SUPPORTING DATA: PHYLOGENETIC TREES, MULTIPLE SEQUENCE ALIGNMENTS AND SEQUENCE ANNOTATION For each PANTHER family, data are available to support the curated classifications. The multiple sequence alignments used to generate the phylogenetic trees can be downloaded and viewed in a web browser. One of the features of the MSA viewer is that it highlights not only family-conserved columns (amino acids conserved across the entire family), but also subfamily-conserved columns (amino acids conserved within a subfamily but not found in other subfamilies). Curator-defined subfamilies have distinct annotations and often distinct functions, so these subfamily-conserved columns provide hypotheses about which residues may mediate functional divergence or specificity (Fig. 2). The phylogenetic trees, including the curator-defined subfamily divisions, can be viewed as GIF images. Subfamily nodes can be expanded to view sequence-level annotations from GenBank and SWISS-PROT (10), to verify curator definitions (Fig. 3). We also provide forms to make it easy for users of PANTHER to help correct names and ontology associations, and keep them up-to-date. ACCURATE ASSIGNMENT OF FUNCTION USING HMMS FROM CURATED PROTEIN FAMILIES AND SUBFAMILIES PANTHER/X functional ontology associations for gene products have been shown to be very accurate (3), primarily

5 338 Nucleic Acids Research, 2003, Vol. 31, No. 1 Figure 3. The PANTHER tree-attribute view for verifying curation. (A) The collapsed view, showing the curator-defined subfamilies and ontology associations. (B) The expanded view, showing all of the constituent sequences and their annotations. due to the emphasis on biologist curation, and to the tree-based homology inference method. Curators define subfamilies in the context of a phylogenetic tree Much of the curation of the PANTHER library is performed in the context of a phylogenetic tree (1). Trees are constructed for each family to represent the sequence-level relationships. A biologist curator then reviews the tree, dividing it into subtrees (subfamilies) such that all the sequences in a given subfamily can be given the same name and functional assignments. Names are free-text (following a set of defined guidelines available on the website), while the functional assignments use controlled PANTHER/X ontology terms. The family and subfamily groupings provide sets of training sequences for building HMMs. The design of PANTHER, and the curation effort in particular, has been biased toward functional annotation and ontology classification. Most of the curation effort is devoted to assigning functions in the context of a phylogenetic tree

6 Nucleic Acids Research, 2003, Vol. 31, No Figure 4. Examples of PANTHER subfamilies capturing functional divergence. (A) Laminin-related proteins have divergent domain structures (which correlates with divergence within the shared laminin domain), while (B) Secretin-related GPCRs have divergent sequences within a common domain. Both cases can generally be modelled using subfamily HMMs.

7 340 Nucleic Acids Research, 2003, Vol. 31, No. 1 representation, using functional information from SWISS- PROT and GenBank records, as well as more detailed information, if necessary, in OMIM ( nih.gov/omim/) and PubMed abstracts. A PANTHER family is defined to be as diverse as possible (increasing the number of sequences from which functional inferences can be made) while keeping it tight enough that the resulting tree is accurate. In the current version of PANTHER, we do not hand-curate the alignments or trees, or even demand that families be mutually exclusive; instead, curators judge them on how well they perform functional annotation. The tree-building algorithm is based on a distance metric derived from HMM scoring, so if proteins with the same function are located in the same subtree, the resulting subfamily HMMs will be predictive of function. Competition between family and subfamily-level HMMs allows appropriate homology-based inference The family and subfamily HMMs are then used to score sequences that were not in the training set. One of the advantages of PANTHER is the ability to assign specific functions, without overgeneralization. A sequence database search commonly assigns function based on the best hit. The advantage is that this assignment can be very specific, such as a GPCR having serotonin as a ligand. The disadvantage is that it is difficult to know when the query is too distant from the hit, and that the inference of serotonin binding is therefore incorrect. A family database search, on the other hand, will generally be correct in associating a sequence with a family, but cannot capture the specificity of function in divergent families. For example, there are members of the aldo-keto reductase family that function as ion channel subunits. PANTHER combines the advantages of both methods, by including both family and subfamily models in the HMM library. If the best hit is a subfamily HMM, and the HMM score is above the accepted threshold, then a specific annotation can be made, while a family HMM best hit often allows a less specific annotation. Following the example above, a family-level best hit will result in the annotation aldo-keto reductase 2 family member and no curated ontology terms, while a subfamily hit results in the annotation potassium voltage-gated channel, beta subunit ( family 6, subfamily A), and the ontology associations voltage-gated potassium channel (molecular function) and cation transport (biological process). In the current release of PANTHER, all significant HMM scores are stored for each FlyBase Drosophila protein, and LocusLink human protein. The classification of each gene product is based on the best HMM score. For non-experts, whenever an HMM score is reported, it is accompanied by a relation icon that indicates the relative certainty of the classification. As the scores become less significant, the probability becomes higher that the classification is in error. Even using a permissive score cutoff of 35 ( distantly related, i.e. the lowest degree of certainty), the total error rate for Drosophila molecular function classifications was shown to be less than 2% (3). Because PANTHER/LIB comprises over HMMs, it is not yet practical to provide a general web interface for HMM scoring of user-defined sequences. However, PANTHER/LIB HMM scoring can be made available as an additional service, or for collaborations. PANTHER HMM annotations can differ from domain-based HMM annotation Databases such as Pfam and SMART have used the HMM formalism to provide an extremely useful tool for identifying conserved functional and structural domains in a protein sequence. PANTHER uses HMMs somewhat differently, with the goal of annotating the overall biological function of a protein. Like Pfam and SMART, the PANTHER family-level HMMs often have a functional annotation based on a single domain. PANTHER subfamily-level HMMs (and many familylevel HMMs as well), however, can be more informative than the simple sum of the individual domain annotations. For example, the protein encoded by the human gene HSPG2 contains many different domains, including the LDL receptor A domain, epidermal growth factor repeat-like domains, immunoglobulin-like domains and both laminin B and laminin G domains. Each of these domains is found in different combinations across a variety of proteins having divergent functions. The only one of these domains that can be assigned a consistent function is the laminin-type EGF domain, which has been assigned by Interpro to the Gene Ontology (molecular function) term structural molecule. By contrast, the highest scoring PANTHER HMM is the subfamily heparan sulfate proteoglycan perlecan (CF10574:SF31), which is assigned to the PANTHER/X ontology terms (molecular function) extracellular matrix glycoprotein, and (biological processes) cell adhesion and cell adhesion-mediated signalling. This is a specific subfamily of the broader PANTHER family lamininrelated (CF10574), which, like the Pfam laminin B and G domains, is not assigned to any functional terms (Fig. 4A). Even for single-domain proteins the PANTHER subfamily HMMs often allow for more specific functional inferences than is possible from more general HMMs such as Pfam and SMART. For example, the CALCR gene product hits the Pfam HMM for the secretin-like seven transmembrane receptor family, which is assigned to the GO molecular function G protein-coupled receptor. The highest-scoring PANTHER HMM is the subfamily calcitonin receptor (CF12011:SF18), which is assigned to G protein-coupled receptor, as well as to the biological processes skeletal development and other neuronal activities. The more specific assignments are correct for this subfamily but not for all members in the larger family (Fig. 4B). ACKNOWLEDGEMENTS We thank Kimmen Sjolander, Gangadharan Subramanian, Mark Yandell, Anthony Kerlavage, Richard Mural and Michael Ashburner for helpful discussions. We thank Matteo di Tommaso, James Jordan, Brian Karlak and Bruce Moxon for critical software engineering assistance. We also thank the many biologists who helped to curate the PANTHER library.

8 Nucleic Acids Research, 2003, Vol. 31, No REFERENCES 1. Thomas,P.D., Campbell,M.J., Kejariwal,A., Mi,H., Karlak,B., Daverman,R., Diemer,K. and Muruganujan,A. PANTHER: a library of protein families and subfamilies indexed by function, submitted. 2. Venter,J.C., Adams,M.D., Myers,E.W., Li,P.W., Mural,R.J. et al. (2001) The sequence of the human genome. Science, 291, Mi,H., Vandergriff,J., Campbell,M., Narechania,A., Lewis,S., Thomas,P.D. and Ashburner,M. Assessment of genome-wide protein function classification for Drosophila melanogaster, submitted. 4. Sonnhammer,E.L., Eddy,S.R. and Durbin,R. (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins, 28, Schultz,J., Milpetz,F., Bork,P. and Ponting,C.P. (1998) SMART, a simple modular architecture research tool: identification of signaling domains. Proc. Natl Acad. Sci. USA, 95, Pruitt,K.D., Katz,K.S., Sicotte,H. and Maglott,D.R. (2000) Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. Trends Genet., 16, Kerlavage,A., Bonazzi,V., di Tommaso,M., Lawrence,C., Li,P., Mayberry,F., Mural,R., Nodell,M., Yandell,M., Zhang,J. and Thomas,P.D. (2002) The Celera Discovery System. Nucleic Acids Res., 30, Ashburner,M., Ball,C.A., Blake,J.A., Botstein,D., Butler,H., Cherry,J.M., Davis,A.P., Dolinski,K., Dwight,S.S., Eppig,J.T., Harris,M.A., Hill,D.P., Issel-Tarver,L., Kasarskis,A., Lewis,S., Matese,J.C., Richardson,J.E., Ringwald,M., Rubin,G.M. and Sherlock,G. (2000) Gene ontology: tool for the unification of biology. Nature Genet., 25, Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, Bairoch,A. and Apweiler,R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in Nucleic Acids Res., 28,

Some Problems from Enzyme Families

Some Problems from Enzyme Families Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

BMD645. Integration of Omics

BMD645. Integration of Omics BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study

More information

EBI web resources II: Ensembl and InterPro

EBI web resources II: Ensembl and InterPro EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013 EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice

More information

Homology. and. Information Gathering and Domain Annotation for Proteins

Homology. and. Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology

More information

Gene Ontology and overrepresentation analysis

Gene Ontology and overrepresentation analysis Gene Ontology and overrepresentation analysis Kjell Petersen J Express Microarray analysis course Oslo December 2009 Presentation adapted from Endre Anderssen and Vidar Beisvåg NMC Trondheim Overview How

More information

Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

NetAffx GPCR annotation database summary December 12, 2001

NetAffx GPCR annotation database summary December 12, 2001 NetAffx GPCR annotation database summary December 12, 2001 Introduction Only approximately 51% of the human proteome can be annotated by the standard motif-based recognition systems [1]. These systems,

More information

The SUPERFAMILY database in 2007: families and functions

The SUPERFAMILY database in 2007: families and functions Nucleic Acids Research Advance Access published November 10, 2006 Nucleic Acids Research, 2006, Vol. 00, Database issue D1 D6 doi:10.1093/nar/gkl910 The SUPERFAMILY database in 2007: families and functions

More information

Hands-On Nine The PAX6 Gene and Protein

Hands-On Nine The PAX6 Gene and Protein Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.

More information

-max_target_seqs: maximum number of targets to report

-max_target_seqs: maximum number of targets to report Review of exercise 1 tblastn -num_threads 2 -db contig -query DH10B.fasta -out blastout.xls -evalue 1e-10 -outfmt "6 qseqid sseqid qstart qend sstart send length nident pident evalue" Other options: -max_target_seqs:

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)

More information

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Article Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Yun-Fei Wang, Huan Chen, and Yan-Hong Zhou* Hubei Bioinformatics and Molecular Imaging Key Laboratory,

More information

GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón

GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón What is GO? The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

SABIO-RK Integration and Curation of Reaction Kinetics Data Ulrike Wittig

SABIO-RK Integration and Curation of Reaction Kinetics Data  Ulrike Wittig SABIO-RK Integration and Curation of Reaction Kinetics Data http://sabio.villa-bosch.de/sabiork Ulrike Wittig Overview Introduction /Motivation Database content /User interface Data integration Curation

More information

Francisco M. Couto Mário J. Silva Pedro Coutinho

Francisco M. Couto Mário J. Silva Pedro Coutinho Francisco M. Couto Mário J. Silva Pedro Coutinho DI FCUL TR 03 29 Departamento de Informática Faculdade de Ciências da Universidade de Lisboa Campo Grande, 1749 016 Lisboa Portugal Technical reports are

More information

Integration of functional genomics data

Integration of functional genomics data Integration of functional genomics data Laboratoire Bordelais de Recherche en Informatique (UMR) Centre de Bioinformatique de Bordeaux (Plateforme) Rennes Oct. 2006 1 Observations and motivations Genomics

More information

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018 DATA ACQUISITION FROM BIO-DATABASES AND BLAST Natapol Pornputtapong 18 January 2018 DATABASE Collections of data To share multi-user interface To prevent data loss To make sure to get the right things

More information

Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple

More information

Comparing whole genomes

Comparing whole genomes BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will

More information

Subfamily HMMS in Functional Genomics. D. Brown, N. Krishnamurthy, J.M. Dale, W. Christopher, and K. Sjölander

Subfamily HMMS in Functional Genomics. D. Brown, N. Krishnamurthy, J.M. Dale, W. Christopher, and K. Sjölander Subfamily HMMS in Functional Genomics D. Brown, N. Krishnamurthy, J.M. Dale, W. Christopher, and K. Sjölander Pacific Symposium on Biocomputing 10:322-333(2005) SUBFAMILY HMMS IN FUNCTIONAL GENOMICS DUNCAN

More information

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms Bioinformatics Advance Access published March 7, 2007 The Author (2007). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

More information

RGP finder: prediction of Genomic Islands

RGP finder: prediction of Genomic Islands Training courses on MicroScope platform RGP finder: prediction of Genomic Islands Dynamics of bacterial genomes Gene gain Horizontal gene transfer Gene loss Deletion of one or several genes Duplication

More information

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand

More information

3did: interacting protein domains of known three-dimensional structure

3did: interacting protein domains of known three-dimensional structure 3did: interacting protein domains of known three-dimensional structure Amelie Stein 1, Robert B. Russell 1,2 and Patrick Aloy 1, * Nucleic Acids Research, 2005, Vol. 33, Database issue D413 D417 doi:10.1093/nar/gki037

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

Student Handout Fruit Fly Ethomics & Genomics

Student Handout Fruit Fly Ethomics & Genomics Student Handout Fruit Fly Ethomics & Genomics Summary of Laboratory Exercise In this laboratory unit, students will connect behavioral phenotypes to their underlying genes and molecules in the model genetic

More information

Single alignment: Substitution Matrix. 16 march 2017

Single alignment: Substitution Matrix. 16 march 2017 Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block

More information

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein

More information

A Database of human biological pathways

A Database of human biological pathways A Database of human biological pathways Steve Jupe - sjupe@ebi.ac.uk 1 Rationale Journal information Nature 407(6805):770-6.The Biochemistry of Apoptosis. Caspase-8 is the key initiator caspase in the

More information

objective functions...

objective functions... objective functions... COFFEE (Notredame et al. 1998) measures column by column similarity between pairwise and multiple sequence alignments assumes that the pairwise alignments are optimal assumes a set

More information

Supplementary Information 16

Supplementary Information 16 Supplementary Information 16 Cellular Component % of Genes 50 45 40 35 30 25 20 15 10 5 0 human mouse extracellular other membranes plasma membrane cytosol cytoskeleton mitochondrion ER/Golgi translational

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

How much non-coding DNA do eukaryotes require?

How much non-coding DNA do eukaryotes require? How much non-coding DNA do eukaryotes require? Andrei Zinovyev UMR U900 Computational Systems Biology of Cancer Institute Curie/INSERM/Ecole de Mine Paritech Dr. Sebastian Ahnert Dr. Thomas Fink Bioinformatics

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Networks & pathways. Hedi Peterson MTAT Bioinformatics Networks & pathways Hedi Peterson (peterson@quretec.com) MTAT.03.239 Bioinformatics 03.11.2010 Networks are graphs Nodes Edges Edges Directed, undirected, weighted Nodes Genes Proteins Metabolites Enzymes

More information

Jeremy Chang Identifying protein protein interactions with statistical coupling analysis

Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Abstract: We used an algorithm known as statistical coupling analysis (SCA) 1 to create a set of features for building

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it? Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains

More information

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Research Proposal Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Name: Minjal Pancholi Howard University Washington, DC. June 19, 2009 Research

More information

Synteny Portal Documentation

Synteny Portal Documentation Synteny Portal Documentation Synteny Portal is a web application portal for visualizing, browsing, searching and building synteny blocks. Synteny Portal provides four main web applications: SynCircos,

More information

Using Bioinformatics to Study Evolutionary Relationships Instructions

Using Bioinformatics to Study Evolutionary Relationships Instructions 3 Using Bioinformatics to Study Evolutionary Relationships Instructions Student Researcher Background: Making and Using Multiple Sequence Alignments One of the primary tasks of genetic researchers is comparing

More information

Large-Scale Genomic Surveys

Large-Scale Genomic Surveys Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

More information

The MANTiS Manual. Contents. MANTiS Version 1.1

The MANTiS Manual. Contents. MANTiS Version 1.1 The MANTiS Manual MANTiS Version 1.1 Contents Connection to the MANTiS database... 2 Memory settings... 2 Main functionalities... 2 Character Mapping View... 4 Genome content View... 5 Biological processes

More information

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:

More information

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007 -2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open

More information

Holtzclaw Ap Biology Guide Answers Ch 46

Holtzclaw Ap Biology Guide Answers Ch 46 HOLTZCLAW AP BIOLOGY GUIDE ANSWERS CH 46 PDF - Are you looking for holtzclaw ap biology guide answers ch 46 Books? Now, you will be happy that at this time holtzclaw ap biology guide answers ch 46 PDF

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

SoyBase, the USDA-ARS Soybean Genetics and Genomics Database

SoyBase, the USDA-ARS Soybean Genetics and Genomics Database SoyBase, the USDA-ARS Soybean Genetics and Genomics Database David Grant Victoria Carollo Blake Steven B. Cannon Kevin Feeley Rex T. Nelson Nathan Weeks SoyBase Site Map and Navigation Video Tutorials:

More information

Tutorial. Getting started. Sample to Insight. March 31, 2016

Tutorial. Getting started. Sample to Insight. March 31, 2016 Getting started March 31, 2016 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com Getting started

More information

Update on human genome completion and annotations: Protein information resource

Update on human genome completion and annotations: Protein information resource UPDATE ON GENOME COMPLETION AND ANNOTATIONS Update on human genome completion and annotations: Protein information resource Cathy Wu 1 and Daniel W. Nebert 2 * 1 Director of PIR, Department of Biochemistry

More information

Framework for a Protein Ontology

Framework for a Protein Ontology Framework for a rotein Ontology TMBIO November 2006 Darren A. Natale, h.d. rotein Science Team Lead, IR Research Assistant rofessor, GUMC GO: ontologies that pertain, in part, to the locations, the processes,

More information

Computational Biology Course Descriptions 12-14

Computational Biology Course Descriptions 12-14 Computational Biology Course Descriptions 12-14 Course Number and Title INTRODUCTORY COURSES BIO 311C: Introductory Biology I BIO 311D: Introductory Biology II BIO 325: Genetics CH 301: Principles of Chemistry

More information

Database update 3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families

Database update 3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families Database update 3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families Agnel P. Joseph 1, Prashant Shingate 1,2, Atul K. Upadhyay 1 and

More information

Functional Annotation

Functional Annotation Functional Annotation Outline Introduction Strategy Pipeline Databases Now, what s next? Functional Annotation Adding the layers of analysis and interpretation necessary to extract its biological significance

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database Dina Vishnyakova 1,2, 4, *, Julien Gobeill 1,3,4, Emilie Pasche 1,2,3,4 and Patrick Ruch

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Introduction Bioinformatics is a powerful tool which can be used to determine evolutionary relationships and

More information

Campbell Biology AP Edition 11 th Edition, 2018

Campbell Biology AP Edition 11 th Edition, 2018 A Correlation and Narrative Summary of Campbell Biology AP Edition 11 th Edition, 2018 To the AP Biology Curriculum Framework AP is a trademark registered and/or owned by the College Board, which was not

More information

CAP 5510 Lecture 3 Protein Structures

CAP 5510 Lecture 3 Protein Structures CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity

More information

BIOINFORMATICS: An Introduction

BIOINFORMATICS: An Introduction BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

GEP Annotation Report

GEP Annotation Report GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein The parsimony principle: A quick review Find the tree that requires the fewest

More information

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

The CATH Database provides insights into protein structure/function relationships

The CATH Database provides insights into protein structure/function relationships 1999 Oxford University Press Nucleic Acids Research, 1999, Vol. 27, No. 1 275 279 The CATH Database provides insights into protein structure/function relationships C. A. Orengo, F. M. G. Pearl, J. E. Bray,

More information

Introduction to Bioinformatics Online Course: IBT

Introduction to Bioinformatics Online Course: IBT Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple

More information

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic Cross Discipline Analysis made possible with Data Pipelining J.R. Tozer SciTegic System Genesis Pipelining tool created to automate data processing in cheminformatics Modular system built with generic

More information

BLAST: Target frequencies and information content Dannie Durand

BLAST: Target frequencies and information content Dannie Durand Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences

More information

GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations

GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations Database, 2016, 1 8 doi: 10.1093/database/baw027 Original article Original article GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations Amaia Sangrador-Vegas

More information

ProtoNet 4.0: A hierarchical classification of one million protein sequences

ProtoNet 4.0: A hierarchical classification of one million protein sequences ProtoNet 4.0: A hierarchical classification of one million protein sequences Noam Kaplan 1*, Ori Sasson 2, Uri Inbar 2, Moriah Friedlich 2, Menachem Fromer 2, Hillel Fleischer 2, Elon Portugaly 2, Nathan

More information

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species PGA: A Program for Genome Annotation by Comparative Analysis of Maximum Likelihood Phylogenies of Genes and Species Paulo Bandiera-Paiva 1 and Marcelo R.S. Briones 2 1 Departmento de Informática em Saúde

More information

Gene function annotation

Gene function annotation Gene function annotation Paul D. Thomas, Ph.D. University of Southern California What is function annotation? The formal answer to the question: what does this gene do? The association between: a description

More information

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family Jieming Shen 1,2 and Hugh B. Nicholas, Jr. 3 1 Bioengineering and Bioinformatics Summer

More information

Biology Assessment. Eligible Texas Essential Knowledge and Skills

Biology Assessment. Eligible Texas Essential Knowledge and Skills Biology Assessment Eligible Texas Essential Knowledge and Skills STAAR Biology Assessment Reporting Category 1: Cell Structure and Function The student will demonstrate an understanding of biomolecules

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Lecture 2. The Blast2GO annotation framework

Lecture 2. The Blast2GO annotation framework Lecture 2 The Blast2GO annotation framework Annotation steps Modulation of annotation intensity Export/Import Functions Sequence Selection Additional Tools Functional assignment Annotation Transference

More information

STAAR Biology Assessment

STAAR Biology Assessment STAAR Biology Assessment Reporting Category 1: Cell Structure and Function The student will demonstrate an understanding of biomolecules as building blocks of cells, and that cells are the basic unit of

More information

2 Spial. Chapter 1. Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6. Pathway level. Atomic level. Cellular level. Proteome level.

2 Spial. Chapter 1. Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6. Pathway level. Atomic level. Cellular level. Proteome level. 2 Spial Chapter Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Spial Quorum sensing Chemogenomics Descriptor relationships Introduction Conclusions and perspectives Atomic level Pathway level Proteome

More information

A model for the evaluation of domain based classification of GPCR

A model for the evaluation of domain based classification of GPCR 4(4): 138-142 (2009) 138 A model for the evaluation of domain based classification of GPCR Tannu Kumari *, Bhaskar Pant, Kamalraj Raj Pardasani Department of Mathematics, MANIT, Bhopal - 462051, India;

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Big Idea 1: The process of evolution drives the diversity and unity of life.

Big Idea 1: The process of evolution drives the diversity and unity of life. Big Idea 1: The process of evolution drives the diversity and unity of life. understanding 1.A: Change in the genetic makeup of a population over time is evolution. 1.A.1: Natural selection is a major

More information

Prediction of protein function from sequence analysis

Prediction of protein function from sequence analysis Prediction of protein function from sequence analysis Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy The omic era Genome Sequencing Projects: Archaea: 74 species In Progress:52 Bacteria:

More information

Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis

Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis Title Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis Author list Yu Han 1, Huihua Wan 1, Tangren Cheng 1, Jia Wang 1, Weiru Yang 1, Huitang Pan 1* & Qixiang

More information