PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification
|
|
- Irene Cooper
- 6 years ago
- Views:
Transcription
1 Nucleic Acids Research, 2003, Vol. 31, No. 1 # 2003 Oxford University Press DOI: /nar/gkg115 PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification Paul D.Thomas*, Anish Kejariwal, Michael J.Campbell, Huaiyu Mi, Karen Diemer, Nan Guo, Istvan Ladunga, Betty Ulitsky-Lazareva, Anushya Muruganujan, Steven Rabkin, Jody A.Vandergriff and Olivier Doremieux Protein Informatics, Celera Genomics, 850 Lincoln Center Drive, Foster City, CA 94404, USA ReceivedAugust 30, 2002; RevisedandAcceptedOctober 27, 2002 ABSTRACT The PANTHER database was designed for highthroughput analysis of protein sequences.one of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions.biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups.the advantage of this approach is that new sequences can be automatically classified as they become available.to ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies.multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family.the current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster.panther is publicly available on the web at INTRODUCTION The PANTHER database was designed for high-throughput functional analysis of large sets of protein sequences (1). It has been used to annotate the human genome (2) as well as the Drosophila genome (3). Like databases such as Pfam (4) and SMART (5), PANTHER uses a library of Hidden Markov Models (HMMs) to annotate sequences with information from homologous sequences. However, unlike these databases, the goal of PANTHER is not to annotate individual domains, but the overall biological function(s) of the molecule. Also unlike these other databases, because many protein families have branches that have diverged in function during evolution, the PANTHER library contains HMMs not only for families, but also for functionally distinct subfamilies. In these cases, subfamily annotation allows a much more precise definition of nomenclature and biological function. PANTHER is composed of two main components: the PANTHER library (PANTHER/LIB) and the PANTHER index (PANTHER/X). PANTHER/LIB is a collection of books, each representing a protein family as a multiple sequence alignment, an HMM and a family tree. Functional divergence within the family is represented by first dividing the tree into subtrees (subfamilies) based on shared function, and then constructing a distinct HMM for each subfamily. PANTHER/X is an abbreviated ontology for summarizing and navigating molecular (biochemical) functions and biological processes (such as pathways, cellular roles or even physiological functions). Families and subfamilies are defined and named by biologist curators, who then associate each group of sequences with terms in the PANTHER/X ontology. Protein query sequences can then be scored against the functionally-labelled family and subfamily HMMs. Query sequences are classified with the name and functional assignments of the best-scoring HMM, with the HMM score providing an estimate of the confidence level of the classification. Like other HMM-based approaches, PANTHER classification scales well for genome projects: the curated functional assignment is performed up-front on sets of training sequences that span many organisms, and can then be transferred to other organisms using the labelled HMMs. As a result, the PANTHER database classifies a significantly larger fraction of human genes than does LocusLink (Table 1). PANTHER has been available to Celera Discovery System (CDS) (7) subscribers for almost two years, and is now publicly available to academic users at com. The public version uses the GenBank non-redundant protein database to define sets of training sequences for HMMs. These HMMs are used to classify human gene products from LocusLink, and Drosophila melanogaster gene products from FlyBase ( *To whom correspondence should be addressed. paul.thomas@fc.celera.com
2 Nucleic Acids Research, 2003, Vol. 31, No
3 336 Nucleic Acids Research, 2003, Vol. 31, No. 1 Figure 1. (previous page and above) Browsing the PANTHER database by biological functions. (A) Selection of biological processes under lipid, fatty acid and steroid metabolism (note that categories can be independently selected/deselected, so, for example, steroid metabolism has been deselected). (B) Retrieval of protein families and subfamilies assigned by curators to the selected functional categories. (C) Retrieval of a list of human genes encoding proteins that match the selected family and subfamily HMMs. release3download.shtml). The CDS version includes training proteins from the sets curated at Celera, with additional HMM scoring of Celera-curated human and mouse gene products. BROWSING GENES BY FUNCTION A key feature of PANTHER is that it can be browsed by protein functions, facilitating access to biologists. Browsing of controlled vocabulary terms can be much simpler than trying to construct effective queries in databases that have free text annotations. The primary entry point into PANTHER is the PANTHER Prowler, which uses the file-folder analogy to navigate PANTHER/X molecular functions and biological processes (Fig. 1). The PANTHER/X ontology is essentially hierarchical, though, more accurately, it is a directed acyclic graph as child categories occasionally appear under more than one parent if it is biologically justified. For example, the biological process DNA replication is a child of two categories: (1) nucleoside, nucleotide and nucleic acid metabolism, and (2) cell cycle. PANTHER/X contains many of the same higherlevel categories as the more comprehensive Gene Ontology (GO) (8), and has been mapped to GO (3), but is arranged quite differently in order to facilitate navigation and large-scale analysis of protein sets. PANTHER/X also contains a number of vertebrate-specific categories that do not appear in the current release of GO, such as additional developmental and immune system categories. After a set of functions is selected, the Prowler retrieves the list of protein families and/or subfamilies that have been previously assigned, by biologist curators, to those functions. Table 1. The percentage of human genes (approximated by LocusLink entries) having functional ontology classifications from PANTHER and from Locus- Link GO associations LocusLink GO Molecular function (NP) 42% 52% Molecular function (XP) 0% 19% Biological process (NP) 41% 46% Biological process (XP) 0% 17% PANTHER/X Percentages of genes classified are shown for two sets of LocusLink entries: NP (with a curated RefSeq protein, accession beginning with NP, total: ), and XP (with only a provisional RefSeq entry, accession beginning with XP, total: ). The total number of LocusLink entries that hit a PANTHER HMM is 9276 (67%) for NP, and 9141 (24%) for XP.
4 Nucleic Acids Research, 2003, Vol. 31, No Figure 2. The PANTHER multiple sequence alignment view, highlighting globally conserved positions (black and gray), and subfamily-specific conservation patterns that may indicate residues important for functional specificity (red). Pfam domains are shown as blue bars, one for each subfamily. A user can make further selections in the family/subfamily list, and then generate a list of proteins or genes that scored significantly against the HMMs for the selected families and subfamilies. In the current version, gene lists are available for LocusLink human genes, and FlyBase Drosophila genes. The LocusLink and FlyBase sequences used to create these gene lists are updated on a monthly basis. Gene lists can be sorted and easily exported in tab-delimited format. In addition to browsing, PANTHER can be accessed by text searching of curator-assigned family and subfamily names, or of the GenBank identifiers or definition lines of training sequences. Training sequences for the classification can also be searched by BLASTP (9). SUPPORTING DATA: PHYLOGENETIC TREES, MULTIPLE SEQUENCE ALIGNMENTS AND SEQUENCE ANNOTATION For each PANTHER family, data are available to support the curated classifications. The multiple sequence alignments used to generate the phylogenetic trees can be downloaded and viewed in a web browser. One of the features of the MSA viewer is that it highlights not only family-conserved columns (amino acids conserved across the entire family), but also subfamily-conserved columns (amino acids conserved within a subfamily but not found in other subfamilies). Curator-defined subfamilies have distinct annotations and often distinct functions, so these subfamily-conserved columns provide hypotheses about which residues may mediate functional divergence or specificity (Fig. 2). The phylogenetic trees, including the curator-defined subfamily divisions, can be viewed as GIF images. Subfamily nodes can be expanded to view sequence-level annotations from GenBank and SWISS-PROT (10), to verify curator definitions (Fig. 3). We also provide forms to make it easy for users of PANTHER to help correct names and ontology associations, and keep them up-to-date. ACCURATE ASSIGNMENT OF FUNCTION USING HMMS FROM CURATED PROTEIN FAMILIES AND SUBFAMILIES PANTHER/X functional ontology associations for gene products have been shown to be very accurate (3), primarily
5 338 Nucleic Acids Research, 2003, Vol. 31, No. 1 Figure 3. The PANTHER tree-attribute view for verifying curation. (A) The collapsed view, showing the curator-defined subfamilies and ontology associations. (B) The expanded view, showing all of the constituent sequences and their annotations. due to the emphasis on biologist curation, and to the tree-based homology inference method. Curators define subfamilies in the context of a phylogenetic tree Much of the curation of the PANTHER library is performed in the context of a phylogenetic tree (1). Trees are constructed for each family to represent the sequence-level relationships. A biologist curator then reviews the tree, dividing it into subtrees (subfamilies) such that all the sequences in a given subfamily can be given the same name and functional assignments. Names are free-text (following a set of defined guidelines available on the website), while the functional assignments use controlled PANTHER/X ontology terms. The family and subfamily groupings provide sets of training sequences for building HMMs. The design of PANTHER, and the curation effort in particular, has been biased toward functional annotation and ontology classification. Most of the curation effort is devoted to assigning functions in the context of a phylogenetic tree
6 Nucleic Acids Research, 2003, Vol. 31, No Figure 4. Examples of PANTHER subfamilies capturing functional divergence. (A) Laminin-related proteins have divergent domain structures (which correlates with divergence within the shared laminin domain), while (B) Secretin-related GPCRs have divergent sequences within a common domain. Both cases can generally be modelled using subfamily HMMs.
7 340 Nucleic Acids Research, 2003, Vol. 31, No. 1 representation, using functional information from SWISS- PROT and GenBank records, as well as more detailed information, if necessary, in OMIM ( nih.gov/omim/) and PubMed abstracts. A PANTHER family is defined to be as diverse as possible (increasing the number of sequences from which functional inferences can be made) while keeping it tight enough that the resulting tree is accurate. In the current version of PANTHER, we do not hand-curate the alignments or trees, or even demand that families be mutually exclusive; instead, curators judge them on how well they perform functional annotation. The tree-building algorithm is based on a distance metric derived from HMM scoring, so if proteins with the same function are located in the same subtree, the resulting subfamily HMMs will be predictive of function. Competition between family and subfamily-level HMMs allows appropriate homology-based inference The family and subfamily HMMs are then used to score sequences that were not in the training set. One of the advantages of PANTHER is the ability to assign specific functions, without overgeneralization. A sequence database search commonly assigns function based on the best hit. The advantage is that this assignment can be very specific, such as a GPCR having serotonin as a ligand. The disadvantage is that it is difficult to know when the query is too distant from the hit, and that the inference of serotonin binding is therefore incorrect. A family database search, on the other hand, will generally be correct in associating a sequence with a family, but cannot capture the specificity of function in divergent families. For example, there are members of the aldo-keto reductase family that function as ion channel subunits. PANTHER combines the advantages of both methods, by including both family and subfamily models in the HMM library. If the best hit is a subfamily HMM, and the HMM score is above the accepted threshold, then a specific annotation can be made, while a family HMM best hit often allows a less specific annotation. Following the example above, a family-level best hit will result in the annotation aldo-keto reductase 2 family member and no curated ontology terms, while a subfamily hit results in the annotation potassium voltage-gated channel, beta subunit ( family 6, subfamily A), and the ontology associations voltage-gated potassium channel (molecular function) and cation transport (biological process). In the current release of PANTHER, all significant HMM scores are stored for each FlyBase Drosophila protein, and LocusLink human protein. The classification of each gene product is based on the best HMM score. For non-experts, whenever an HMM score is reported, it is accompanied by a relation icon that indicates the relative certainty of the classification. As the scores become less significant, the probability becomes higher that the classification is in error. Even using a permissive score cutoff of 35 ( distantly related, i.e. the lowest degree of certainty), the total error rate for Drosophila molecular function classifications was shown to be less than 2% (3). Because PANTHER/LIB comprises over HMMs, it is not yet practical to provide a general web interface for HMM scoring of user-defined sequences. However, PANTHER/LIB HMM scoring can be made available as an additional service, or for collaborations. PANTHER HMM annotations can differ from domain-based HMM annotation Databases such as Pfam and SMART have used the HMM formalism to provide an extremely useful tool for identifying conserved functional and structural domains in a protein sequence. PANTHER uses HMMs somewhat differently, with the goal of annotating the overall biological function of a protein. Like Pfam and SMART, the PANTHER family-level HMMs often have a functional annotation based on a single domain. PANTHER subfamily-level HMMs (and many familylevel HMMs as well), however, can be more informative than the simple sum of the individual domain annotations. For example, the protein encoded by the human gene HSPG2 contains many different domains, including the LDL receptor A domain, epidermal growth factor repeat-like domains, immunoglobulin-like domains and both laminin B and laminin G domains. Each of these domains is found in different combinations across a variety of proteins having divergent functions. The only one of these domains that can be assigned a consistent function is the laminin-type EGF domain, which has been assigned by Interpro to the Gene Ontology (molecular function) term structural molecule. By contrast, the highest scoring PANTHER HMM is the subfamily heparan sulfate proteoglycan perlecan (CF10574:SF31), which is assigned to the PANTHER/X ontology terms (molecular function) extracellular matrix glycoprotein, and (biological processes) cell adhesion and cell adhesion-mediated signalling. This is a specific subfamily of the broader PANTHER family lamininrelated (CF10574), which, like the Pfam laminin B and G domains, is not assigned to any functional terms (Fig. 4A). Even for single-domain proteins the PANTHER subfamily HMMs often allow for more specific functional inferences than is possible from more general HMMs such as Pfam and SMART. For example, the CALCR gene product hits the Pfam HMM for the secretin-like seven transmembrane receptor family, which is assigned to the GO molecular function G protein-coupled receptor. The highest-scoring PANTHER HMM is the subfamily calcitonin receptor (CF12011:SF18), which is assigned to G protein-coupled receptor, as well as to the biological processes skeletal development and other neuronal activities. The more specific assignments are correct for this subfamily but not for all members in the larger family (Fig. 4B). ACKNOWLEDGEMENTS We thank Kimmen Sjolander, Gangadharan Subramanian, Mark Yandell, Anthony Kerlavage, Richard Mural and Michael Ashburner for helpful discussions. We thank Matteo di Tommaso, James Jordan, Brian Karlak and Bruce Moxon for critical software engineering assistance. We also thank the many biologists who helped to curate the PANTHER library.
8 Nucleic Acids Research, 2003, Vol. 31, No REFERENCES 1. Thomas,P.D., Campbell,M.J., Kejariwal,A., Mi,H., Karlak,B., Daverman,R., Diemer,K. and Muruganujan,A. PANTHER: a library of protein families and subfamilies indexed by function, submitted. 2. Venter,J.C., Adams,M.D., Myers,E.W., Li,P.W., Mural,R.J. et al. (2001) The sequence of the human genome. Science, 291, Mi,H., Vandergriff,J., Campbell,M., Narechania,A., Lewis,S., Thomas,P.D. and Ashburner,M. Assessment of genome-wide protein function classification for Drosophila melanogaster, submitted. 4. Sonnhammer,E.L., Eddy,S.R. and Durbin,R. (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins, 28, Schultz,J., Milpetz,F., Bork,P. and Ponting,C.P. (1998) SMART, a simple modular architecture research tool: identification of signaling domains. Proc. Natl Acad. Sci. USA, 95, Pruitt,K.D., Katz,K.S., Sicotte,H. and Maglott,D.R. (2000) Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. Trends Genet., 16, Kerlavage,A., Bonazzi,V., di Tommaso,M., Lawrence,C., Li,P., Mayberry,F., Mural,R., Nodell,M., Yandell,M., Zhang,J. and Thomas,P.D. (2002) The Celera Discovery System. Nucleic Acids Res., 30, Ashburner,M., Ball,C.A., Blake,J.A., Botstein,D., Butler,H., Cherry,J.M., Davis,A.P., Dolinski,K., Dwight,S.S., Eppig,J.T., Harris,M.A., Hill,D.P., Issel-Tarver,L., Kasarskis,A., Lewis,S., Matese,J.C., Richardson,J.E., Ringwald,M., Rubin,G.M. and Sherlock,G. (2000) Gene ontology: tool for the unification of biology. Nature Genet., 25, Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, Bairoch,A. and Apweiler,R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in Nucleic Acids Res., 28,
Some Problems from Enzyme Families
Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems
More informationGenome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.
Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction
More informationBMD645. Integration of Omics
BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study
More informationEBI web resources II: Ensembl and InterPro
EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course
More informationBioinformatics. Dept. of Computational Biology & Bioinformatics
Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS
More informationEBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013
EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice
More informationHomology. and. Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology
More informationGene Ontology and overrepresentation analysis
Gene Ontology and overrepresentation analysis Kjell Petersen J Express Microarray analysis course Oslo December 2009 Presentation adapted from Endre Anderssen and Vidar Beisvåg NMC Trondheim Overview How
More informationHomology and Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The
More informationSUPPLEMENTARY INFORMATION
Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationNetAffx GPCR annotation database summary December 12, 2001
NetAffx GPCR annotation database summary December 12, 2001 Introduction Only approximately 51% of the human proteome can be annotated by the standard motif-based recognition systems [1]. These systems,
More informationThe SUPERFAMILY database in 2007: families and functions
Nucleic Acids Research Advance Access published November 10, 2006 Nucleic Acids Research, 2006, Vol. 00, Database issue D1 D6 doi:10.1093/nar/gkl910 The SUPERFAMILY database in 2007: families and functions
More informationHands-On Nine The PAX6 Gene and Protein
Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.
More information-max_target_seqs: maximum number of targets to report
Review of exercise 1 tblastn -num_threads 2 -db contig -query DH10B.fasta -out blastout.xls -evalue 1e-10 -outfmt "6 qseqid sseqid qstart qend sstart send length nident pident evalue" Other options: -max_target_seqs:
More informationBIOINFORMATICS LAB AP BIOLOGY
BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to
More informationGrundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)
More informationEnsembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:
Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,
More informationSUPPLEMENTARY INFORMATION
Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,
More informationPrediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines
Article Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Yun-Fei Wang, Huan Chen, and Yan-Hong Zhou* Hubei Bioinformatics and Molecular Imaging Key Laboratory,
More informationGENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón
GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón What is GO? The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in
More informationCS612 - Algorithms in Bioinformatics
Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available
More informationBioinformatics tools for phylogeny and visualization. Yanbin Yin
Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and
More informationBioinformatics Exercises
Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted
More informationChapter 5. Proteomics and the analysis of protein sequence Ⅱ
Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and
More informationSABIO-RK Integration and Curation of Reaction Kinetics Data Ulrike Wittig
SABIO-RK Integration and Curation of Reaction Kinetics Data http://sabio.villa-bosch.de/sabiork Ulrike Wittig Overview Introduction /Motivation Database content /User interface Data integration Curation
More informationFrancisco M. Couto Mário J. Silva Pedro Coutinho
Francisco M. Couto Mário J. Silva Pedro Coutinho DI FCUL TR 03 29 Departamento de Informática Faculdade de Ciências da Universidade de Lisboa Campo Grande, 1749 016 Lisboa Portugal Technical reports are
More informationIntegration of functional genomics data
Integration of functional genomics data Laboratoire Bordelais de Recherche en Informatique (UMR) Centre de Bioinformatique de Bordeaux (Plateforme) Rennes Oct. 2006 1 Observations and motivations Genomics
More informationDATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018
DATA ACQUISITION FROM BIO-DATABASES AND BLAST Natapol Pornputtapong 18 January 2018 DATABASE Collections of data To share multi-user interface To prevent data loss To make sure to get the right things
More informationMultiple sequence alignment
Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple
More informationComparing whole genomes
BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will
More informationSubfamily HMMS in Functional Genomics. D. Brown, N. Krishnamurthy, J.M. Dale, W. Christopher, and K. Sjölander
Subfamily HMMS in Functional Genomics D. Brown, N. Krishnamurthy, J.M. Dale, W. Christopher, and K. Sjölander Pacific Symposium on Biocomputing 10:322-333(2005) SUBFAMILY HMMS IN FUNCTIONAL GENOMICS DUNCAN
More information2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms
Bioinformatics Advance Access published March 7, 2007 The Author (2007). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
More informationRGP finder: prediction of Genomic Islands
Training courses on MicroScope platform RGP finder: prediction of Genomic Islands Dynamics of bacterial genomes Gene gain Horizontal gene transfer Gene loss Deletion of one or several genes Duplication
More information2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.
Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand
More information3did: interacting protein domains of known three-dimensional structure
3did: interacting protein domains of known three-dimensional structure Amelie Stein 1, Robert B. Russell 1,2 and Patrick Aloy 1, * Nucleic Acids Research, 2005, Vol. 33, Database issue D413 D417 doi:10.1093/nar/gki037
More informationBLAST. Varieties of BLAST
BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationStudent Handout Fruit Fly Ethomics & Genomics
Student Handout Fruit Fly Ethomics & Genomics Summary of Laboratory Exercise In this laboratory unit, students will connect behavioral phenotypes to their underlying genes and molecules in the model genetic
More informationSingle alignment: Substitution Matrix. 16 march 2017
Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block
More informationBioinformatics 2. Yeast two hybrid. Proteomics. Proteomics
GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein
More informationA Database of human biological pathways
A Database of human biological pathways Steve Jupe - sjupe@ebi.ac.uk 1 Rationale Journal information Nature 407(6805):770-6.The Biochemistry of Apoptosis. Caspase-8 is the key initiator caspase in the
More informationobjective functions...
objective functions... COFFEE (Notredame et al. 1998) measures column by column similarity between pairwise and multiple sequence alignments assumes that the pairwise alignments are optimal assumes a set
More informationSupplementary Information 16
Supplementary Information 16 Cellular Component % of Genes 50 45 40 35 30 25 20 15 10 5 0 human mouse extracellular other membranes plasma membrane cytosol cytoskeleton mitochondrion ER/Golgi translational
More informationCSCE555 Bioinformatics. Protein Function Annotation
CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The
More informationHow much non-coding DNA do eukaryotes require?
How much non-coding DNA do eukaryotes require? Andrei Zinovyev UMR U900 Computational Systems Biology of Cancer Institute Curie/INSERM/Ecole de Mine Paritech Dr. Sebastian Ahnert Dr. Thomas Fink Bioinformatics
More informationCISC 636 Computational Biology & Bioinformatics (Fall 2016)
CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationNetworks & pathways. Hedi Peterson MTAT Bioinformatics
Networks & pathways Hedi Peterson (peterson@quretec.com) MTAT.03.239 Bioinformatics 03.11.2010 Networks are graphs Nodes Edges Edges Directed, undirected, weighted Nodes Genes Proteins Metabolites Enzymes
More informationJeremy Chang Identifying protein protein interactions with statistical coupling analysis
Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Abstract: We used an algorithm known as statistical coupling analysis (SCA) 1 to create a set of features for building
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns
More informationProteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?
Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains
More informationResearch Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.
Research Proposal Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Name: Minjal Pancholi Howard University Washington, DC. June 19, 2009 Research
More informationSynteny Portal Documentation
Synteny Portal Documentation Synteny Portal is a web application portal for visualizing, browsing, searching and building synteny blocks. Synteny Portal provides four main web applications: SynCircos,
More informationUsing Bioinformatics to Study Evolutionary Relationships Instructions
3 Using Bioinformatics to Study Evolutionary Relationships Instructions Student Researcher Background: Making and Using Multiple Sequence Alignments One of the primary tasks of genetic researchers is comparing
More informationLarge-Scale Genomic Surveys
Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction
More informationThe MANTiS Manual. Contents. MANTiS Version 1.1
The MANTiS Manual MANTiS Version 1.1 Contents Connection to the MANTiS database... 2 Memory settings... 2 Main functionalities... 2 Character Mapping View... 4 Genome content View... 5 Biological processes
More informationMATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME
MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:
More informationMathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007
-2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open
More informationHoltzclaw Ap Biology Guide Answers Ch 46
HOLTZCLAW AP BIOLOGY GUIDE ANSWERS CH 46 PDF - Are you looking for holtzclaw ap biology guide answers ch 46 Books? Now, you will be happy that at this time holtzclaw ap biology guide answers ch 46 PDF
More informationBLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010
BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for
More informationSoyBase, the USDA-ARS Soybean Genetics and Genomics Database
SoyBase, the USDA-ARS Soybean Genetics and Genomics Database David Grant Victoria Carollo Blake Steven B. Cannon Kevin Feeley Rex T. Nelson Nathan Weeks SoyBase Site Map and Navigation Video Tutorials:
More informationTutorial. Getting started. Sample to Insight. March 31, 2016
Getting started March 31, 2016 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com Getting started
More informationUpdate on human genome completion and annotations: Protein information resource
UPDATE ON GENOME COMPLETION AND ANNOTATIONS Update on human genome completion and annotations: Protein information resource Cathy Wu 1 and Daniel W. Nebert 2 * 1 Director of PIR, Department of Biochemistry
More informationFramework for a Protein Ontology
Framework for a rotein Ontology TMBIO November 2006 Darren A. Natale, h.d. rotein Science Team Lead, IR Research Assistant rofessor, GUMC GO: ontologies that pertain, in part, to the locations, the processes,
More informationComputational Biology Course Descriptions 12-14
Computational Biology Course Descriptions 12-14 Course Number and Title INTRODUCTORY COURSES BIO 311C: Introductory Biology I BIO 311D: Introductory Biology II BIO 325: Genetics CH 301: Principles of Chemistry
More informationDatabase update 3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families
Database update 3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families Agnel P. Joseph 1, Prashant Shingate 1,2, Atul K. Upadhyay 1 and
More informationFunctional Annotation
Functional Annotation Outline Introduction Strategy Pipeline Databases Now, what s next? Functional Annotation Adding the layers of analysis and interpretation necessary to extract its biological significance
More informationTiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1
Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with
More informationToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database
ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database Dina Vishnyakova 1,2, 4, *, Julien Gobeill 1,3,4, Emilie Pasche 1,2,3,4 and Patrick Ruch
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationInvestigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST
Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Introduction Bioinformatics is a powerful tool which can be used to determine evolutionary relationships and
More informationCampbell Biology AP Edition 11 th Edition, 2018
A Correlation and Narrative Summary of Campbell Biology AP Edition 11 th Edition, 2018 To the AP Biology Curriculum Framework AP is a trademark registered and/or owned by the College Board, which was not
More informationCAP 5510 Lecture 3 Protein Structures
CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity
More informationBIOINFORMATICS: An Introduction
BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and
More information08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega
BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments
More informationGEP Annotation Report
GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationGene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein The parsimony principle: A quick review Find the tree that requires the fewest
More informationBioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing
Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.
More informationBiological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor
Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms
More informationThe CATH Database provides insights into protein structure/function relationships
1999 Oxford University Press Nucleic Acids Research, 1999, Vol. 27, No. 1 275 279 The CATH Database provides insights into protein structure/function relationships C. A. Orengo, F. M. G. Pearl, J. E. Bray,
More informationIntroduction to Bioinformatics Online Course: IBT
Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple
More informationCross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic
Cross Discipline Analysis made possible with Data Pipelining J.R. Tozer SciTegic System Genesis Pipelining tool created to automate data processing in cheminformatics Modular system built with generic
More informationBLAST: Target frequencies and information content Dannie Durand
Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences
More informationGO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations
Database, 2016, 1 8 doi: 10.1093/database/baw027 Original article Original article GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations Amaia Sangrador-Vegas
More informationProtoNet 4.0: A hierarchical classification of one million protein sequences
ProtoNet 4.0: A hierarchical classification of one million protein sequences Noam Kaplan 1*, Ori Sasson 2, Uri Inbar 2, Moriah Friedlich 2, Menachem Fromer 2, Hillel Fleischer 2, Elon Portugaly 2, Nathan
More informationPGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species
PGA: A Program for Genome Annotation by Comparative Analysis of Maximum Likelihood Phylogenies of Genes and Species Paulo Bandiera-Paiva 1 and Marcelo R.S. Briones 2 1 Departmento de Informática em Saúde
More informationGene function annotation
Gene function annotation Paul D. Thomas, Ph.D. University of Southern California What is function annotation? The formal answer to the question: what does this gene do? The association between: a description
More informationA bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family
A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family Jieming Shen 1,2 and Hugh B. Nicholas, Jr. 3 1 Bioengineering and Bioinformatics Summer
More informationBiology Assessment. Eligible Texas Essential Knowledge and Skills
Biology Assessment Eligible Texas Essential Knowledge and Skills STAAR Biology Assessment Reporting Category 1: Cell Structure and Function The student will demonstrate an understanding of biomolecules
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationLecture 2. The Blast2GO annotation framework
Lecture 2 The Blast2GO annotation framework Annotation steps Modulation of annotation intensity Export/Import Functions Sequence Selection Additional Tools Functional assignment Annotation Transference
More informationSTAAR Biology Assessment
STAAR Biology Assessment Reporting Category 1: Cell Structure and Function The student will demonstrate an understanding of biomolecules as building blocks of cells, and that cells are the basic unit of
More information2 Spial. Chapter 1. Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6. Pathway level. Atomic level. Cellular level. Proteome level.
2 Spial Chapter Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Spial Quorum sensing Chemogenomics Descriptor relationships Introduction Conclusions and perspectives Atomic level Pathway level Proteome
More informationA model for the evaluation of domain based classification of GPCR
4(4): 138-142 (2009) 138 A model for the evaluation of domain based classification of GPCR Tannu Kumari *, Bhaskar Pant, Kamalraj Raj Pardasani Department of Mathematics, MANIT, Bhopal - 462051, India;
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationBig Idea 1: The process of evolution drives the diversity and unity of life.
Big Idea 1: The process of evolution drives the diversity and unity of life. understanding 1.A: Change in the genetic makeup of a population over time is evolution. 1.A.1: Natural selection is a major
More informationPrediction of protein function from sequence analysis
Prediction of protein function from sequence analysis Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy The omic era Genome Sequencing Projects: Archaea: 74 species In Progress:52 Bacteria:
More informationComparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis
Title Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis Author list Yu Han 1, Huihua Wan 1, Tangren Cheng 1, Jia Wang 1, Weiru Yang 1, Huitang Pan 1* & Qixiang
More information