We have: We will: Assembled six genomes Made predictions of most likely gene locations. Add a layers of biological meaning to the sequences
|
|
- Paul Williamson
- 5 years ago
- Views:
Transcription
1
2 Recap We have: Assembled six genomes Made predictions of most likely gene locations We will: Add a layers of biological meaning to the sequences
3 Start with Biology This will motivate the choices we make in picking tools for bioinformatics later on
4 Pal, Debnath. (2006) On gene ontology and function annotation.
5
6 Pal, Debnath. (2006) On gene ontology and function annotation.
7 scale Functional annotation Assigning biological meaning to sequence info Types of genomic features (increasing scale) Short sequences Genes naming and function description Control of expression promoter Operons Pathways/Networks
8 Short sequences AGTGTTCTGATTACTGGGACTAAGTGCGGTACGTACGATGAGTCGATCAAATGCGTGC
9 Short sequences Four main categories Dispersed repeat motifs Competence signals Promoter regions Homopolymer tracts Short-motif SSR 2-6 bases repeats Have been shown to modulate virulence Informative in epidemiological studies for phylogeny, etc Knock-out of these regions Long-motif SSR 8+
10 Dispersed repeat motif AAGTGCGGT = one signal for competence machinery in Haemophilus influenzae Promoters TATA box (Pribnow box) at -10 TTGACAT at -35, allows for high transcription
11 Genes gena AGTGTTCTGATTACTGGGACTAAGTGCGGTACGTACGATGAGTCGATCAAATGCGTGC
12 Gene naming So the LORD God formed out of the ground various wild animals and various birds of the air, [and protein coding genes,] and he brought them to the man to see what he would call them; whatever the man called each of them would be its name. Genesis 2:19
13 Gene naming
14 Gene naming Gene ontology Study of what the gene is Assigning putative function How is this helpful? Facilitates communication of much information 2000 genes time 6 genomes Confirmation of experimental data from the CDC Allows for comparative analysis
15 Gene ontology Sub-domains Molecular function Elemental activities of a gene product at mol. Level Binding Catalysis Biological processes Sets of mol. events with defined beginning and end E.g. - Induction of cell death Cellular components The parts of a cell or its extracellular environment An Introduction to the Gene Ontology.
16 Operons gena genb genc gena AGTGTTCTGATTACTGGGACTAAGTGCGGTACGTACGATGAGTCGATCAAATGCGTGC
17 What is an operon Operon - a cluster of structural genes that are expressed as a group and their associated promoter and operator. In addition to being physically close in the genome, these genes are regulated such that they are all turned on or off together.
18
19 lac Operon in E.coli
20
21
22 Operons in Haemophilus influenzae hitabc Periplasmic iron transport operon, encoding a classic high affinity iron acquisition system. dprabc Genes required for efficient processing of linear DNA during cellular transformation.
23 Why operons are important Bacteria respond to changing environments by altering their gene expression patterns; thus, they express different enzymes depending on the carbon sources and other nutrients available to them. Grouping related genes under a common control mechanism allows bacteria to rapidly adapt to changes in the environment.
24 Functional networks some function gena genb genc gena AGTGTTCTGATTACTGGGACTAAGTGCGGTACGTACGATGAGTCGATCAAATGCGTGC
25 Examples of functions 6x 6x hν 1x 6x
26 to breath Gibbs free energy gotta work Metabolism n X1 X1 + n X2 X2 + + n XK XK -> n Y1 Y1 + n Y2 Y2 + + n YJ YJ Role of proteins in metabolism: help get over the free energy barrier! Reaction coordinate
27 Metabolism n X1 X1 + n X2 X2 + + n XK XK -> n Y1 Y1 + n Y2 Y2 + + n YJ YJ from A. Goelzer, et al. BMC Systems Biology 2008, 2:20
28 Flagellar biogenesis and chemotaxis Modifications to DNA sequences (and thus the functional network) can result in phenotypic changes WT Tumble mutant Speed mutant Left figure from S. Kalir, et al. Science 292, 2080 (2001)
29 Competence and transformation
30 From biology to bioinformatics GENOME DATA + GENE PREDICTIONS Small sequences Genes Operons Networks/Pathways Networks/Pathways FINAL ANNOTATION
31 Simple Pipeline for Short Sequences AGTGTTCTGATTACTGGGACTAAGTGCGGTACGTACGATGAGTCGATCAAATGCGTGC Genome data Ab initio patterns Database Motif finder The computer should be doing the hard work. That's what it's paid to do, after all ~Larry Wall Statistical analysis Final annotation The most important point is that the biases in the distributions [of sequence motifs] need to be supported by some statistical analyses. Some sort of goodness-of-fit such as chi-square with an appropriate correction for multiple tests should suffice. ~King Jordan
32 GENE PREDICTION RESULTS Analyze Overlaps Identify overlaps and store for future analysis High + Medium Low Gene Level BLASTn BLASTx Pangenome Panproteome Haemophilus database intrinsic Transcript Level INTERPROSCAN Reverse PSI -BLAST BLASTx CDD UNIPROT Consensus Molecular Function Cellular Component SignalP LipoP TMHMM Results BLASTx NR Analyze overlaps extrinsic FINAL ANNOTATION GO terms Level 1 Small Sequence Pipeline KEGG Pathway Tools Pathways Level 2 Operon DOORS OPERON DB
33 Understanding the Gene Pipeline gena Homology and BLAST InterProScan Ab initio Methods
34 Homology and BLAST Homology is sequence similarity due to common ancestry. BLAST- heuristic algorithm for matching similar sequences. Blastn, blastp Blastx, tblastn, tblastx RPS-Blast
35 Steps of Blast Filter out low-complexity repeats May give statistically significant but biologically uninteresting results Generate list of all words in query Length of 3/11 for aa/nt query Precompute all possible high-scoring matches to these words Use this expanded word list as query Search database for sequences containing two nearby exact matches Score hits
36 Scoring Matrices PAM - calculated from a model of evolutionary distance Based on alignments of closely related sequences PAM1 - probability that 1 aa in 100 will undergo substitution PAM(N) = PAM ^ N PAM120 considered good for scoring closely related sequences
37 Scoring Matrices BLOSUM - derived from BLOCKS database Blocks were sorted into closely related clusters Frequency of substitutions between clusters within a family used to calculate probability of meaningful substitution BLOSUM(N) - N=cutoff value for percentage sequence identity that defines the clusters
38 Database Look for hits in related genomes Expected functional relationship H. flu Haemophilus pan-genome Pasteurellaceae family May contain more closely related organisms that Haemophilus
39 Blastn, Blastx 80% identity If a gene encodes a protein, blastx expected to be better aa sequence more complex, contains more functional information Frameshift due to sequencing error Blastn would still hit, blastx would fail
40 RPS-Blast Identify conserved domains in proteins Compares protein sequence to a database of position specific scoring matrices (PSSM) Uses substitution frequency at each position in MSAs of recognized conserved domains From SMART, PFAM, LOAD
41 InterPro Database of databases 13 officially integrated Signatures derived from the collection Represent domains, families, functional sites, etc Manually curated
42 HMM Databases PIRSF Superfamilies based on evolutionary relationships TIGRFAMs Functionally equivalent proteins equivalogs PANTHER Divergence of function within families
43 HMM DB continued Pfam Protein families based on functional regions Gene3D Structural annotation Extends CATH structural domain database SUPERFAMILY Structural annotation SCOP structural domain database
44 Profiles & Patterns HAMAP Identify conserved prokaryotic protein families and subfamilies PROSITE Profiles predict structural properties of proteins Patterns predict protein function
45 Clusters and Fingerprints ProDom Sequence clusters built from UniprotKB PRINTS Conserved motifs used as fingerprints
46 Integration into InterPro Signature Database Version *** Signatures* Integrated Signatures** GENE3D 3.3.0* HAMAP PANTHER PIRSF PRINTS PROSITE patterns 20.66* PROSITE profiles 20.66* Pfam PfamB ProDom SMART SUPERFAMILY 1.73* TIGRFAMs 9.0* * Some signatures may not have matches to UniProtKB proteins. ** Not all signatures of a member database may be integrated at the time of an InterPro release. *** InterPro is using older version of DBs marked with a * symbol Data based off current InterPro release 31.0, 9 th February 2011 (link)
47 Integration continued InterPro and UniProtKB Sequence Database Version Count count of proteins matching any signature integrated signatures UniProtKB 2011_ (85.5%) (79.3%) UniProtKB/TrEMBL 2011_ (85.0%) (78.7%) UniProtKB/Swiss-Prot 2011_ (97.2%) (95.3%) InterPro to GO 24,236 GO terms mapped to InterPro entries
48 InterProScan A suite of tools ScanRegExp, Pfscan, FingerPrintScan, HMMpfam Web-based vs. standalone install Run limitations Input limitations Signatures
49 InterProScan Output Formats Raw, html, gff3 Output Accession Numbers Swiss-Prot, PDB, TrEMBL, Member DBs etc Annotation GO Terms, Structural, Functional, etc Metadata Literature references, taxonomy, cross-references, etc
50 Intrinsic Method (Ab initio) SignalP LipoP TMHMM
51 SignalP SignalP 3.0 service A prediction of cleavage sites and a signal peptide/non-signal peptide prediction Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes Several artificial neural networks and hidden Markov models
52 Biology Background Proteins have intrinsic signal that govern their transport and localization in the cell". Günter Blobel Signal peptide: cleaved by signal peptidase I (SPase). Signal anchors are "uncleaved signal peptides" which has no SPase recognition site
53 Data sets The data used for SignalP version 3.0 were extracted from SWISS-PROT version 40
54 Algorithms 2 Neural networks: one for predicting the actual signal peptide and one for predicting the position of the signal peptidase I (SPase I) cleavage site.
55 Algorithms The HMM: prediction of signal anchors in addition to the prediction of signal peptides
56 Input
57 Output C-score: the ``cleavage site'' score S-score : signal peptide indicator Y-score: a better cleavage site prediction
58 Output
59 LipoP LipoP 1.0 server predictions of lipoproteins Gram-negative bacteria only HMM
60 Biology background Prokaryotic lipoprotein cleavage sites are not predicted using SignalP. Prokaryotic lipoproteins are cleaved by a specific lipoprotein signal peptidase, Lsp or signal peptidase II. This peptidase recognizes a conserved sequence and cuts upstream of a cysteine residue to which a glyceride-fatty acid lipid is attached. The cleavage sites of these proteins differ considerably from those cleaved by the standard prokaryotic signal peptidase (SpaseII).
61 Input/Output
62 TMHMM TMHMM Server v. 2.0 Prediction of transmembrane helices in proteins HMM
63 Input/Output The program takes proteins in FASTA format. It recognizes the 20 amino acids and B, Z, and X, which are all treated equally as unknown. Any other character is changed to X
64 Operon Pipeline Tools gena genb genc OperonDB DOORS
65 OperonDB Operon DataBase Relies on conservation of gene order and orientation in two or more species to infer operon structure Calculate the probability that gene pairs belong in the same operon Needs a training set of genomes Input: Full sequence + Gene loci
66 OperonDB output Gene1 Gene2 confidence Lv.
67 Pro/Cons Can use training set to bias the data for Haemophilus genus Can only find operons that are conserved in other species as well
68 DOORS Database for prokaryotic OpeRons Predicts operons based on the features of gene pairs Intergenic distance Distance between adjacent genes phylogenetic profiles Conservation of gene neighborhood Similarity score between GO terms of gene pairs Frequencies of specific DNA motifs in intergenic regions Use above features to train a linear logistic function-based classifier
69 DOORS input Full genome sequence file - fasta Gene location information - gff Protein Sequence information - fasta
70 DOORS classification
71 Pro / Cons Brings in data from other operon databases: ODB, MicrobesOnline Operon Not all operons in DOORS are experimentally verified
72 Functional network tools KEGG : Kyoto Encyclopedia of Genes and Genomes
73 About KEGG Initiated in May 1995 under the Human Genome program of the Ministry of Education, Science, Sports and Culture in Japan. Developed by the Kanehisa Laboratory (Bioinformatics Center) in the Institute for Chemical research, Kyoto University Database resource for understanding higher order functions and utilities of the biology system of the cell or organism from genomic and molecular information.
74 Components of KEGG
75 GENES database: GENBANK + NCBI RefSeq +EMBL + publically available organism specific databases. Genes in high-quality genomes: (140 eukaryotes, 1185 bacteria, 95 archaea):6,290,236 (as of 2011/3/2) Internal re-annotation -> SSEARCH SSDB database: Sequence similarity database -Pre-computed sequence similarity scores + best hits (SSEARCH) -Generates ortholog clusters and paralog clusters KO System -KO (KEGG Orthology) identifiers or K numbers -pathway based classification of orthologous genes -common identifier for linking genomic to pathway information KAAS-SSBD+ GFIT + manual verification PATHWAY mapping and BRITE mapping: - Based on K numbers, computationally generates organism specific pathways and BRITE hierachies.
76 PATHWAY database The KEGG PATHWAY database is a collection of manually drawn pathway maps for: metabolism, genetic information, processing, various other cellular processes and human diseases. KEGG reference pathways (maps) a known network of functional significance. organism-specific pathways: automatically generated by superimposing (coloring) genes in given organisms
77 BRITE database KEGG BRITE is a collection of hierarchical classifications representing our knowledge on various aspects of biological systems. In contrast to KEGG PATHWAY, which is limited to molecular interactions and reactions, KEGG BRITE incorporates many different types of relationships. It includes various biological objects, including molecules, cells, organisms, diseases and drugs, as well as relationships among them. Mainly aims to automate functional interpretation KEGG pathway reconstruction KEGG BRITE mapping is the process to map molecular datasets, to the BRITE functional hierarchies for biological interpretation of higher-level systemic functions.
78 PATHWAY TOOLS
79 Pathway Tools is a comprehensive symbolic systems biology software system. Mainly used to a create a type of modelorganism database (MOD) called Pathway/Genome Database (PGDB). It provides two ways to interact with the PGDB: 1. Graphical component -> to visualize and update contents 2. Ontology and database API -> allows programs to perform complex queries and data mining on the contents.
80 COMPONENTS PathoLogic: Creates a new PGDB containing the predicted metabolic pathways of an organism, Pathway/Genome Navigator: Supports query, visualization, and analysis of PGDBs Pathway/Genome Editors: Provide interactive editing capabilities for PGDBs.
81 WORKFLOW INPUT FILE: Flat file descriptions of genes and gene products Conversion Process Converts to PGDB representation DEVELOPER PATHWAY/GENOME EDITOR: Provides interactive forms for editing contents refining, updating etc. USER Inference Process Predicts metabolic pathway complement MetaCyc Pathway Tools Ontology Groups pathways by functional pathway PATHOLOGIC PATHWAY/GENOME NAVIGATOR Query, visualization and analysis of the PGDB
82 It supports Development of organism-specific databases Computational inferences inlcuding prediction of: metabolic pathways, metabolic pathway hole fillers, operons Scientific Visualization including: Automatic display of metabolic pathways, full metabolic networks A genome browser Display of operons, regulons, and full transcriptional regulatory networks Visual analysis of omics datasets, such as painting omics data onto diagrams of the full metabolic network, full regulatory network, and full genome Comparative analyses of organism-specific databases Analysis of biological networks: Interactively tracing metabolites through the metabolic network Finding dead-end metabolites in metabolic networks
83 GENE PREDICTION RESULTS Analyze Overlaps Identify overlaps and store for future analysis High + Medium Low Gene Level BLASTn BLASTx Pangenome Panproteome Haemophilus database intrinsic Transcript Level INTERPROSCAN Reverse PSI -BLAST BLASTx CDD UNIPROT Consensus Results Molecular Function Cellular Component SignalP LipoP TMHMM ProtCompB BLASTx NR Analyze overlaps extrinsic FINAL ANNOTATION GO terms Level 1 Small Sequence Pipeline KEGG Pathway Tools Pathways Level 2 Operon DOORS OPERON DB
Functional Annotation
Functional Annotation Outline Introduction Strategy Pipeline Databases Now, what s next? Functional Annotation Adding the layers of analysis and interpretation necessary to extract its biological significance
More informationCSCE555 Bioinformatics. Protein Function Annotation
CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The
More information-max_target_seqs: maximum number of targets to report
Review of exercise 1 tblastn -num_threads 2 -db contig -query DH10B.fasta -out blastout.xls -evalue 1e-10 -outfmt "6 qseqid sseqid qstart qend sstart send length nident pident evalue" Other options: -max_target_seqs:
More informationGenome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.
Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction
More informationEBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013
EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice
More informationEBI web resources II: Ensembl and InterPro
EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course
More informationChristian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel
Christian Sigrist General Definition on Conserved Regions Conserved regions in proteins can be classified into 5 different groups: Domains: specific combination of secondary structures organized into a
More informationMotifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC
Motifs, Profiles and Domains Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Comparing Two Proteins Sequence Alignment Determining the pattern of evolution and identifying conserved
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationHomology. and. Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationBioinformatics. Dept. of Computational Biology & Bioinformatics
Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS
More informationHomology and Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The
More informationFUNCTION ANNOTATION PRELIMINARY RESULTS
FUNCTION ANNOTATION PRELIMINARY RESULTS FACTION I KAI YUAN KALYANI PATANKAR KIERA BERGER CAMILA MEDRANO HUBERT PAN JUNKE WANG YANXI CHEN AJAY RAMAKRISHNAN MRUNAL DEHANKAR OVERVIEW Introduction Previous
More informationCS612 - Algorithms in Bioinformatics
Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available
More informationBMD645. Integration of Omics
BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study
More informationChapter 5. Proteomics and the analysis of protein sequence Ⅱ
Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and
More informationProtein function prediction based on sequence analysis
Performing sequence searches Post-Blast analysis, Using profiles and pattern-matching Protein function prediction based on sequence analysis Slides from a lecture on MOL204 - Applied Bioinformatics 18-Oct-2005
More informationNetworks & pathways. Hedi Peterson MTAT Bioinformatics
Networks & pathways Hedi Peterson (peterson@quretec.com) MTAT.03.239 Bioinformatics 03.11.2010 Networks are graphs Nodes Edges Edges Directed, undirected, weighted Nodes Genes Proteins Metabolites Enzymes
More informationLarge-Scale Genomic Surveys
Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction
More informationCISC 636 Computational Biology & Bioinformatics (Fall 2016)
CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein
More informationBLAST. Varieties of BLAST
BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database
More informationComparative genomics: Overview & Tools + MUMmer algorithm
Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first
More informationGenome Annotation. Qi Sun Bioinformatics Facility Cornell University
Genome Annotation Qi Sun Bioinformatics Facility Cornell University Some basic bioinformatics tools BLAST PSI-BLAST - Position-Specific Scoring Matrix HMM - Hidden Markov Model NCBI BLAST How does BLAST
More informationComputational methods for predicting protein-protein interactions
Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational
More informationAlignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)
Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationGenome Annotation Project Presentation
Halogeometricum borinquense Genome Annotation Project Presentation Loci Hbor_05620 & Hbor_05470 Presented by: Mohammad Reza Najaf Tomaraei Hbor_05620 Basic Information DNA Coordinates: 527,512 528,261
More informationBioinformatics methods COMPUTATIONAL WORKFLOW
Bioinformatics methods COMPUTATIONAL WORKFLOW RAW READ PROCESSING: 1. FastQC on raw reads 2. Kraken on raw reads to ID and remove contaminants 3. SortmeRNA to filter out rrna 4. Trimmomatic to filter by
More informationSyllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)
Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural
More informationSUPPLEMENTARY INFORMATION
Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)
More informationSome Problems from Enzyme Families
Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems
More informationMeiothermus ruber Genome Analysis Project
Augustana College Augustana Digital Commons Meiothermus ruber Genome Analysis Project Biology 2018 Predicted ortholog pairs between E. coli and M. ruber are b3456 and mrub_2379, b3457 and mrub_2378, b3456
More informationBLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010
BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for
More information08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega
BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression
More informationMultiple sequence alignment
Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple
More informationStatistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department
More informationfunctional annotation preliminary results
functional annotation preliminary results March 16, 216 Alicia Francis, Andrew Teng, Chen Guo, Devika Singh, Ellie Kim, Harshmi Shah, James Moore, Jose Jaimes, Nadav Topaz, Namrata Kalsi, Petar Penev,
More informationProtein bioinforma-cs. Åsa Björklund CMB/LICR
Protein bioinforma-cs Åsa Björklund CMB/LICR asa.bjorklund@licr.ki.se In this lecture Protein structures and 3D structure predic-on Protein domains HMMs Protein networks Protein func-on annota-on / predic-on
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and
More informationCISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)
CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST
More informationGene function annotation
Gene function annotation Paul D. Thomas, Ph.D. University of Southern California What is function annotation? The formal answer to the question: what does this gene do? The association between: a description
More informationProtein structure alignments
Protein structure alignments Proteins that fold in the same way, i.e. have the same fold are often homologs. Structure evolves slower than sequence Sequence is less conserved than structure If BLAST gives
More informationEnsembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:
Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,
More informationBioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing
Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.
More information- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster.
NCBI BLAST Services DELTA-BLAST BLAST (http://blast.ncbi.nlm.nih.gov/), Basic Local Alignment Search tool, is a suite of programs for finding similarities between biological sequences. DELTA-BLAST is a
More informationDATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018
DATA ACQUISITION FROM BIO-DATABASES AND BLAST Natapol Pornputtapong 18 January 2018 DATABASE Collections of data To share multi-user interface To prevent data loss To make sure to get the right things
More information2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.
Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand
More informationHomology Modeling. Roberto Lins EPFL - summer semester 2005
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,
More informationHands-On Nine The PAX6 Gene and Protein
Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.
More informationBioinformatics and BLAST
Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists
More informationBIOINFORMATICS: An Introduction
BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and
More informationLecture 2. The Blast2GO annotation framework
Lecture 2 The Blast2GO annotation framework Annotation steps Modulation of annotation intensity Export/Import Functions Sequence Selection Additional Tools Functional assignment Annotation Transference
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationFunctional Annotation & Comparative Genomics. Lu Wang, Georgia Tech
Functional Annotation & Comparative Genomics Lu Wang, Georgia Tech Outline Functional annotation What is functional annotation? What needs to be annotated Approaches to functional annotation Pros/cons
More informationA Protein Ontology from Large-scale Textmining?
A Protein Ontology from Large-scale Textmining? Protege-Workshop Manchester, 07-07-2003 Kai Kumpf, Juliane Fluck and Martin Hofmann Instructive mistakes: a narrative Aim: Protein ontology that supports
More informationSCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like
SCOP all-β class 4-helical cytokines T4 endonuclease V all-α class, 3 different folds Globin-like TIM-barrel fold α/β class Profilin-like fold α+β class http://scop.mrc-lmb.cam.ac.uk/scop CATH Class, Architecture,
More informationSUPPLEMENTARY INFORMATION
Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology
More informationIntroduction to Bioinformatics Online Course: IBT
Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple
More informationIntroductory course on Multiple Sequence Alignment Part I: Theoretical foundations
Sequence Analysis and Structure Prediction Service Centro Nacional de Biotecnología CSIC 8-10 May, 2013 Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations Course Notes Instructor:
More informationBioinformatics Chapter 1. Introduction
Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!
More informationComputational approaches for functional genomics
Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding
More informationEBI web resources II: Ensembl and InterPro
EBI web resources II: Ensembl and InterPro Yanbin Yin Fall 2015 h.p://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to h.p://www.ebi.ac.uk/interpro/training.html and finish the second online training
More informationBasic Local Alignment Search Tool
Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses
More informationUpdate on human genome completion and annotations: Protein information resource
UPDATE ON GENOME COMPLETION AND ANNOTATIONS Update on human genome completion and annotations: Protein information resource Cathy Wu 1 and Daniel W. Nebert 2 * 1 Director of PIR, Department of Biochemistry
More informationGene Ontology and overrepresentation analysis
Gene Ontology and overrepresentation analysis Kjell Petersen J Express Microarray analysis course Oslo December 2009 Presentation adapted from Endre Anderssen and Vidar Beisvåg NMC Trondheim Overview How
More informationThe EcoCyc Database. January 25, de Nitrógeno, UNAM,Cuernavaca, A.P. 565-A, Morelos, 62100, Mexico;
The EcoCyc Database Peter D. Karp, Monica Riley, Milton Saier,IanT.Paulsen +, Julio Collado-Vides + Suzanne M. Paley, Alida Pellegrini-Toole,César Bonavides ++, and Socorro Gama-Castro ++ January 25, 2002
More informationComputational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem
University of Groningen Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's
More informationIntegration of functional genomics data
Integration of functional genomics data Laboratoire Bordelais de Recherche en Informatique (UMR) Centre de Bioinformatique de Bordeaux (Plateforme) Rennes Oct. 2006 1 Observations and motivations Genomics
More informationGrundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)
More informationStructure to Function. Molecular Bioinformatics, X3, 2006
Structure to Function Molecular Bioinformatics, X3, 2006 Structural GeNOMICS Structural Genomics project aims at determination of 3D structures of all proteins: - organize known proteins into families
More informationSequence Alignment Techniques and Their Uses
Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this
More informationProtein Families. João C. Setubal University of São Paulo Agosto /23/2012 J. C. Setubal
Protein Families João C. Setubal University of São Paulo Agosto 2012 8/23/2012 J. C. Setubal 1 Motivation Phytophthora Science paper [Tyler et al., 2006] Comparison of the [P. sojae and P. ramorum] genomes
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationIntroduction to Evolutionary Concepts
Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq
More informationTransitioning BioCyc to a Subscription Model
Transitioning BioCyc to a Subscription Model Peter D. Karp SRI International ecocyc.org biocyc.org metacyc.org BioCyc.org Collection of 9,300 Pathway/Genome Databases Pathway/Genome Database (PGDB) combines
More informationBio2. Heuristics, Databases ; Multiple Sequence Alignment ; Gene Finding. Biological Databases (sequences) Armstrong, 2007 Bioinformatics 2
Bio2 Heuristics, Databases ; Multiple Sequence Alignment ; Gene Finding Biological Databases (sequences) 1 Biological Databases Introduction to Sequence Databases Overview of primary query tools and the
More informationPrediction of protein function from sequence analysis
Prediction of protein function from sequence analysis Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy The omic era Genome Sequencing Projects: Archaea: 74 species In Progress:52 Bacteria:
More informationSupplementary Materials for mplr-loc Web-server
Supplementary Materials for mplr-loc Web-server Shibiao Wan and Man-Wai Mak email: shibiao.wan@connect.polyu.hk, enmwmak@polyu.edu.hk June 2014 Back to mplr-loc Server Contents 1 Introduction to mplr-loc
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationPractical considerations of working with sequencing data
Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!
More informationSupplementary Information
Supplementary Information Supplementary Figure 1. Schematic pipeline for single-cell genome assembly, cleaning and annotation. a. The assembly process was optimized to account for multiple cells putatively
More informationAmino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1
Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 2 Amino Acid Structures from Klug & Cummings
More informationHidden Markov Models (HMMs) and Profiles
Hidden Markov Models (HMMs) and Profiles Swiss Institute of Bioinformatics (SIB) 26-30 November 2001 Markov Chain Models A Markov Chain Model is a succession of states S i (i = 0, 1,...) connected by transitions.
More informationGenetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.
Genetic Variation: The genetic substrate for natural selection What about organisms that do not have sexual reproduction? Horizontal Gene Transfer Dr. Carol E. Lee, University of Wisconsin In prokaryotes:
More informationBiological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor
Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms
More informationMiGA: The Microbial Genome Atlas
December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More informationPredicting Protein Functions and Domain Interactions from Protein Interactions
Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput
More information3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.
3.B.1 Gene Regulation Gene regulation results in differential gene expression, leading to cell specialization. We will focus on gene regulation in prokaryotes first. Gene regulation accounts for some of
More informationHomology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB
Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded
More informationIn-Silico Approach for Hypothetical Protein Function Prediction
In-Silico Approach for Hypothetical Protein Function Prediction Shabanam Khatoon Department of Computer Science, Faculty of Natural Sciences Jamia Millia Islamia, New Delhi Suraiya Jabin Department of
More informationRiboflavin Metabolism: A study to see if Mrub_1256 is Orthologous to E. coli b0415, and if Mrub_1254 is Orthologous to E.
Augustana College Augustana Digital Commons Meiothermus ruber Genome Analysis Project Biology Winter 2-2016 Riboflavin Metabolism: A study to see if Mrub_1256 is Orthologous to E. coli b0415, and if Mrub_1254
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationGene Regulation and Expression
THINK ABOUT IT Think of a library filled with how-to books. Would you ever need to use all of those books at the same time? Of course not. Now picture a tiny bacterium that contains more than 4000 genes.
More informationSifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource
Sharpton et al. BMC Bioinformatics 2012, 13:264 RESEARCH ARTICLE Open Access Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource
More information86 Part 4 SUMMARY INTRODUCTION
86 Part 4 Chapter # AN INTEGRATION OF THE DESCRIPTIONS OF GENE NETWORKS AND THEIR MODELS PRESENTED IN SIGMOID (CELLERATOR) AND GENENET Podkolodny N.L. *1, 2, Podkolodnaya N.N. 1, Miginsky D.S. 1, Poplavsky
More information