We have: We will: Assembled six genomes Made predictions of most likely gene locations. Add a layers of biological meaning to the sequences

Size: px
Start display at page:

Download "We have: We will: Assembled six genomes Made predictions of most likely gene locations. Add a layers of biological meaning to the sequences"

Transcription

1

2 Recap We have: Assembled six genomes Made predictions of most likely gene locations We will: Add a layers of biological meaning to the sequences

3 Start with Biology This will motivate the choices we make in picking tools for bioinformatics later on

4 Pal, Debnath. (2006) On gene ontology and function annotation.

5

6 Pal, Debnath. (2006) On gene ontology and function annotation.

7 scale Functional annotation Assigning biological meaning to sequence info Types of genomic features (increasing scale) Short sequences Genes naming and function description Control of expression promoter Operons Pathways/Networks

8 Short sequences AGTGTTCTGATTACTGGGACTAAGTGCGGTACGTACGATGAGTCGATCAAATGCGTGC

9 Short sequences Four main categories Dispersed repeat motifs Competence signals Promoter regions Homopolymer tracts Short-motif SSR 2-6 bases repeats Have been shown to modulate virulence Informative in epidemiological studies for phylogeny, etc Knock-out of these regions Long-motif SSR 8+

10 Dispersed repeat motif AAGTGCGGT = one signal for competence machinery in Haemophilus influenzae Promoters TATA box (Pribnow box) at -10 TTGACAT at -35, allows for high transcription

11 Genes gena AGTGTTCTGATTACTGGGACTAAGTGCGGTACGTACGATGAGTCGATCAAATGCGTGC

12 Gene naming So the LORD God formed out of the ground various wild animals and various birds of the air, [and protein coding genes,] and he brought them to the man to see what he would call them; whatever the man called each of them would be its name. Genesis 2:19

13 Gene naming

14 Gene naming Gene ontology Study of what the gene is Assigning putative function How is this helpful? Facilitates communication of much information 2000 genes time 6 genomes Confirmation of experimental data from the CDC Allows for comparative analysis

15 Gene ontology Sub-domains Molecular function Elemental activities of a gene product at mol. Level Binding Catalysis Biological processes Sets of mol. events with defined beginning and end E.g. - Induction of cell death Cellular components The parts of a cell or its extracellular environment An Introduction to the Gene Ontology.

16 Operons gena genb genc gena AGTGTTCTGATTACTGGGACTAAGTGCGGTACGTACGATGAGTCGATCAAATGCGTGC

17 What is an operon Operon - a cluster of structural genes that are expressed as a group and their associated promoter and operator. In addition to being physically close in the genome, these genes are regulated such that they are all turned on or off together.

18

19 lac Operon in E.coli

20

21

22 Operons in Haemophilus influenzae hitabc Periplasmic iron transport operon, encoding a classic high affinity iron acquisition system. dprabc Genes required for efficient processing of linear DNA during cellular transformation.

23 Why operons are important Bacteria respond to changing environments by altering their gene expression patterns; thus, they express different enzymes depending on the carbon sources and other nutrients available to them. Grouping related genes under a common control mechanism allows bacteria to rapidly adapt to changes in the environment.

24 Functional networks some function gena genb genc gena AGTGTTCTGATTACTGGGACTAAGTGCGGTACGTACGATGAGTCGATCAAATGCGTGC

25 Examples of functions 6x 6x hν 1x 6x

26 to breath Gibbs free energy gotta work Metabolism n X1 X1 + n X2 X2 + + n XK XK -> n Y1 Y1 + n Y2 Y2 + + n YJ YJ Role of proteins in metabolism: help get over the free energy barrier! Reaction coordinate

27 Metabolism n X1 X1 + n X2 X2 + + n XK XK -> n Y1 Y1 + n Y2 Y2 + + n YJ YJ from A. Goelzer, et al. BMC Systems Biology 2008, 2:20

28 Flagellar biogenesis and chemotaxis Modifications to DNA sequences (and thus the functional network) can result in phenotypic changes WT Tumble mutant Speed mutant Left figure from S. Kalir, et al. Science 292, 2080 (2001)

29 Competence and transformation

30 From biology to bioinformatics GENOME DATA + GENE PREDICTIONS Small sequences Genes Operons Networks/Pathways Networks/Pathways FINAL ANNOTATION

31 Simple Pipeline for Short Sequences AGTGTTCTGATTACTGGGACTAAGTGCGGTACGTACGATGAGTCGATCAAATGCGTGC Genome data Ab initio patterns Database Motif finder The computer should be doing the hard work. That's what it's paid to do, after all ~Larry Wall Statistical analysis Final annotation The most important point is that the biases in the distributions [of sequence motifs] need to be supported by some statistical analyses. Some sort of goodness-of-fit such as chi-square with an appropriate correction for multiple tests should suffice. ~King Jordan

32 GENE PREDICTION RESULTS Analyze Overlaps Identify overlaps and store for future analysis High + Medium Low Gene Level BLASTn BLASTx Pangenome Panproteome Haemophilus database intrinsic Transcript Level INTERPROSCAN Reverse PSI -BLAST BLASTx CDD UNIPROT Consensus Molecular Function Cellular Component SignalP LipoP TMHMM Results BLASTx NR Analyze overlaps extrinsic FINAL ANNOTATION GO terms Level 1 Small Sequence Pipeline KEGG Pathway Tools Pathways Level 2 Operon DOORS OPERON DB

33 Understanding the Gene Pipeline gena Homology and BLAST InterProScan Ab initio Methods

34 Homology and BLAST Homology is sequence similarity due to common ancestry. BLAST- heuristic algorithm for matching similar sequences. Blastn, blastp Blastx, tblastn, tblastx RPS-Blast

35 Steps of Blast Filter out low-complexity repeats May give statistically significant but biologically uninteresting results Generate list of all words in query Length of 3/11 for aa/nt query Precompute all possible high-scoring matches to these words Use this expanded word list as query Search database for sequences containing two nearby exact matches Score hits

36 Scoring Matrices PAM - calculated from a model of evolutionary distance Based on alignments of closely related sequences PAM1 - probability that 1 aa in 100 will undergo substitution PAM(N) = PAM ^ N PAM120 considered good for scoring closely related sequences

37 Scoring Matrices BLOSUM - derived from BLOCKS database Blocks were sorted into closely related clusters Frequency of substitutions between clusters within a family used to calculate probability of meaningful substitution BLOSUM(N) - N=cutoff value for percentage sequence identity that defines the clusters

38 Database Look for hits in related genomes Expected functional relationship H. flu Haemophilus pan-genome Pasteurellaceae family May contain more closely related organisms that Haemophilus

39 Blastn, Blastx 80% identity If a gene encodes a protein, blastx expected to be better aa sequence more complex, contains more functional information Frameshift due to sequencing error Blastn would still hit, blastx would fail

40 RPS-Blast Identify conserved domains in proteins Compares protein sequence to a database of position specific scoring matrices (PSSM) Uses substitution frequency at each position in MSAs of recognized conserved domains From SMART, PFAM, LOAD

41 InterPro Database of databases 13 officially integrated Signatures derived from the collection Represent domains, families, functional sites, etc Manually curated

42 HMM Databases PIRSF Superfamilies based on evolutionary relationships TIGRFAMs Functionally equivalent proteins equivalogs PANTHER Divergence of function within families

43 HMM DB continued Pfam Protein families based on functional regions Gene3D Structural annotation Extends CATH structural domain database SUPERFAMILY Structural annotation SCOP structural domain database

44 Profiles & Patterns HAMAP Identify conserved prokaryotic protein families and subfamilies PROSITE Profiles predict structural properties of proteins Patterns predict protein function

45 Clusters and Fingerprints ProDom Sequence clusters built from UniprotKB PRINTS Conserved motifs used as fingerprints

46 Integration into InterPro Signature Database Version *** Signatures* Integrated Signatures** GENE3D 3.3.0* HAMAP PANTHER PIRSF PRINTS PROSITE patterns 20.66* PROSITE profiles 20.66* Pfam PfamB ProDom SMART SUPERFAMILY 1.73* TIGRFAMs 9.0* * Some signatures may not have matches to UniProtKB proteins. ** Not all signatures of a member database may be integrated at the time of an InterPro release. *** InterPro is using older version of DBs marked with a * symbol Data based off current InterPro release 31.0, 9 th February 2011 (link)

47 Integration continued InterPro and UniProtKB Sequence Database Version Count count of proteins matching any signature integrated signatures UniProtKB 2011_ (85.5%) (79.3%) UniProtKB/TrEMBL 2011_ (85.0%) (78.7%) UniProtKB/Swiss-Prot 2011_ (97.2%) (95.3%) InterPro to GO 24,236 GO terms mapped to InterPro entries

48 InterProScan A suite of tools ScanRegExp, Pfscan, FingerPrintScan, HMMpfam Web-based vs. standalone install Run limitations Input limitations Signatures

49 InterProScan Output Formats Raw, html, gff3 Output Accession Numbers Swiss-Prot, PDB, TrEMBL, Member DBs etc Annotation GO Terms, Structural, Functional, etc Metadata Literature references, taxonomy, cross-references, etc

50 Intrinsic Method (Ab initio) SignalP LipoP TMHMM

51 SignalP SignalP 3.0 service A prediction of cleavage sites and a signal peptide/non-signal peptide prediction Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes Several artificial neural networks and hidden Markov models

52 Biology Background Proteins have intrinsic signal that govern their transport and localization in the cell". Günter Blobel Signal peptide: cleaved by signal peptidase I (SPase). Signal anchors are "uncleaved signal peptides" which has no SPase recognition site

53 Data sets The data used for SignalP version 3.0 were extracted from SWISS-PROT version 40

54 Algorithms 2 Neural networks: one for predicting the actual signal peptide and one for predicting the position of the signal peptidase I (SPase I) cleavage site.

55 Algorithms The HMM: prediction of signal anchors in addition to the prediction of signal peptides

56 Input

57 Output C-score: the ``cleavage site'' score S-score : signal peptide indicator Y-score: a better cleavage site prediction

58 Output

59 LipoP LipoP 1.0 server predictions of lipoproteins Gram-negative bacteria only HMM

60 Biology background Prokaryotic lipoprotein cleavage sites are not predicted using SignalP. Prokaryotic lipoproteins are cleaved by a specific lipoprotein signal peptidase, Lsp or signal peptidase II. This peptidase recognizes a conserved sequence and cuts upstream of a cysteine residue to which a glyceride-fatty acid lipid is attached. The cleavage sites of these proteins differ considerably from those cleaved by the standard prokaryotic signal peptidase (SpaseII).

61 Input/Output

62 TMHMM TMHMM Server v. 2.0 Prediction of transmembrane helices in proteins HMM

63 Input/Output The program takes proteins in FASTA format. It recognizes the 20 amino acids and B, Z, and X, which are all treated equally as unknown. Any other character is changed to X

64 Operon Pipeline Tools gena genb genc OperonDB DOORS

65 OperonDB Operon DataBase Relies on conservation of gene order and orientation in two or more species to infer operon structure Calculate the probability that gene pairs belong in the same operon Needs a training set of genomes Input: Full sequence + Gene loci

66 OperonDB output Gene1 Gene2 confidence Lv.

67 Pro/Cons Can use training set to bias the data for Haemophilus genus Can only find operons that are conserved in other species as well

68 DOORS Database for prokaryotic OpeRons Predicts operons based on the features of gene pairs Intergenic distance Distance between adjacent genes phylogenetic profiles Conservation of gene neighborhood Similarity score between GO terms of gene pairs Frequencies of specific DNA motifs in intergenic regions Use above features to train a linear logistic function-based classifier

69 DOORS input Full genome sequence file - fasta Gene location information - gff Protein Sequence information - fasta

70 DOORS classification

71 Pro / Cons Brings in data from other operon databases: ODB, MicrobesOnline Operon Not all operons in DOORS are experimentally verified

72 Functional network tools KEGG : Kyoto Encyclopedia of Genes and Genomes

73 About KEGG Initiated in May 1995 under the Human Genome program of the Ministry of Education, Science, Sports and Culture in Japan. Developed by the Kanehisa Laboratory (Bioinformatics Center) in the Institute for Chemical research, Kyoto University Database resource for understanding higher order functions and utilities of the biology system of the cell or organism from genomic and molecular information.

74 Components of KEGG

75 GENES database: GENBANK + NCBI RefSeq +EMBL + publically available organism specific databases. Genes in high-quality genomes: (140 eukaryotes, 1185 bacteria, 95 archaea):6,290,236 (as of 2011/3/2) Internal re-annotation -> SSEARCH SSDB database: Sequence similarity database -Pre-computed sequence similarity scores + best hits (SSEARCH) -Generates ortholog clusters and paralog clusters KO System -KO (KEGG Orthology) identifiers or K numbers -pathway based classification of orthologous genes -common identifier for linking genomic to pathway information KAAS-SSBD+ GFIT + manual verification PATHWAY mapping and BRITE mapping: - Based on K numbers, computationally generates organism specific pathways and BRITE hierachies.

76 PATHWAY database The KEGG PATHWAY database is a collection of manually drawn pathway maps for: metabolism, genetic information, processing, various other cellular processes and human diseases. KEGG reference pathways (maps) a known network of functional significance. organism-specific pathways: automatically generated by superimposing (coloring) genes in given organisms

77 BRITE database KEGG BRITE is a collection of hierarchical classifications representing our knowledge on various aspects of biological systems. In contrast to KEGG PATHWAY, which is limited to molecular interactions and reactions, KEGG BRITE incorporates many different types of relationships. It includes various biological objects, including molecules, cells, organisms, diseases and drugs, as well as relationships among them. Mainly aims to automate functional interpretation KEGG pathway reconstruction KEGG BRITE mapping is the process to map molecular datasets, to the BRITE functional hierarchies for biological interpretation of higher-level systemic functions.

78 PATHWAY TOOLS

79 Pathway Tools is a comprehensive symbolic systems biology software system. Mainly used to a create a type of modelorganism database (MOD) called Pathway/Genome Database (PGDB). It provides two ways to interact with the PGDB: 1. Graphical component -> to visualize and update contents 2. Ontology and database API -> allows programs to perform complex queries and data mining on the contents.

80 COMPONENTS PathoLogic: Creates a new PGDB containing the predicted metabolic pathways of an organism, Pathway/Genome Navigator: Supports query, visualization, and analysis of PGDBs Pathway/Genome Editors: Provide interactive editing capabilities for PGDBs.

81 WORKFLOW INPUT FILE: Flat file descriptions of genes and gene products Conversion Process Converts to PGDB representation DEVELOPER PATHWAY/GENOME EDITOR: Provides interactive forms for editing contents refining, updating etc. USER Inference Process Predicts metabolic pathway complement MetaCyc Pathway Tools Ontology Groups pathways by functional pathway PATHOLOGIC PATHWAY/GENOME NAVIGATOR Query, visualization and analysis of the PGDB

82 It supports Development of organism-specific databases Computational inferences inlcuding prediction of: metabolic pathways, metabolic pathway hole fillers, operons Scientific Visualization including: Automatic display of metabolic pathways, full metabolic networks A genome browser Display of operons, regulons, and full transcriptional regulatory networks Visual analysis of omics datasets, such as painting omics data onto diagrams of the full metabolic network, full regulatory network, and full genome Comparative analyses of organism-specific databases Analysis of biological networks: Interactively tracing metabolites through the metabolic network Finding dead-end metabolites in metabolic networks

83 GENE PREDICTION RESULTS Analyze Overlaps Identify overlaps and store for future analysis High + Medium Low Gene Level BLASTn BLASTx Pangenome Panproteome Haemophilus database intrinsic Transcript Level INTERPROSCAN Reverse PSI -BLAST BLASTx CDD UNIPROT Consensus Results Molecular Function Cellular Component SignalP LipoP TMHMM ProtCompB BLASTx NR Analyze overlaps extrinsic FINAL ANNOTATION GO terms Level 1 Small Sequence Pipeline KEGG Pathway Tools Pathways Level 2 Operon DOORS OPERON DB

Functional Annotation

Functional Annotation Functional Annotation Outline Introduction Strategy Pipeline Databases Now, what s next? Functional Annotation Adding the layers of analysis and interpretation necessary to extract its biological significance

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

-max_target_seqs: maximum number of targets to report

-max_target_seqs: maximum number of targets to report Review of exercise 1 tblastn -num_threads 2 -db contig -query DH10B.fasta -out blastout.xls -evalue 1e-10 -outfmt "6 qseqid sseqid qstart qend sstart send length nident pident evalue" Other options: -max_target_seqs:

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013 EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice

More information

EBI web resources II: Ensembl and InterPro

EBI web resources II: Ensembl and InterPro EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course

More information

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel Christian Sigrist General Definition on Conserved Regions Conserved regions in proteins can be classified into 5 different groups: Domains: specific combination of secondary structures organized into a

More information

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Motifs, Profiles and Domains Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Comparing Two Proteins Sequence Alignment Determining the pattern of evolution and identifying conserved

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

Homology. and. Information Gathering and Domain Annotation for Proteins

Homology. and. Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

More information

FUNCTION ANNOTATION PRELIMINARY RESULTS

FUNCTION ANNOTATION PRELIMINARY RESULTS FUNCTION ANNOTATION PRELIMINARY RESULTS FACTION I KAI YUAN KALYANI PATANKAR KIERA BERGER CAMILA MEDRANO HUBERT PAN JUNKE WANG YANXI CHEN AJAY RAMAKRISHNAN MRUNAL DEHANKAR OVERVIEW Introduction Previous

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available

More information

BMD645. Integration of Omics

BMD645. Integration of Omics BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

Protein function prediction based on sequence analysis

Protein function prediction based on sequence analysis Performing sequence searches Post-Blast analysis, Using profiles and pattern-matching Protein function prediction based on sequence analysis Slides from a lecture on MOL204 - Applied Bioinformatics 18-Oct-2005

More information

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Networks & pathways. Hedi Peterson MTAT Bioinformatics Networks & pathways Hedi Peterson (peterson@quretec.com) MTAT.03.239 Bioinformatics 03.11.2010 Networks are graphs Nodes Edges Edges Directed, undirected, weighted Nodes Genes Proteins Metabolites Enzymes

More information

Large-Scale Genomic Surveys

Large-Scale Genomic Surveys Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative genomics: Overview & Tools + MUMmer algorithm Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first

More information

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University Genome Annotation Qi Sun Bioinformatics Facility Cornell University Some basic bioinformatics tools BLAST PSI-BLAST - Position-Specific Scoring Matrix HMM - Hidden Markov Model NCBI BLAST How does BLAST

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

Genome Annotation Project Presentation

Genome Annotation Project Presentation Halogeometricum borinquense Genome Annotation Project Presentation Loci Hbor_05620 & Hbor_05470 Presented by: Mohammad Reza Najaf Tomaraei Hbor_05620 Basic Information DNA Coordinates: 527,512 528,261

More information

Bioinformatics methods COMPUTATIONAL WORKFLOW

Bioinformatics methods COMPUTATIONAL WORKFLOW Bioinformatics methods COMPUTATIONAL WORKFLOW RAW READ PROCESSING: 1. FastQC on raw reads 2. Kraken on raw reads to ID and remove contaminants 3. SortmeRNA to filter out rrna 4. Trimmomatic to filter by

More information

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Some Problems from Enzyme Families

Some Problems from Enzyme Families Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems

More information

Meiothermus ruber Genome Analysis Project

Meiothermus ruber Genome Analysis Project Augustana College Augustana Digital Commons Meiothermus ruber Genome Analysis Project Biology 2018 Predicted ortholog pairs between E. coli and M. ruber are b3456 and mrub_2379, b3457 and mrub_2378, b3456

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression

More information

Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple

More information

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

functional annotation preliminary results

functional annotation preliminary results functional annotation preliminary results March 16, 216 Alicia Francis, Andrew Teng, Chen Guo, Devika Singh, Ellie Kim, Harshmi Shah, James Moore, Jose Jaimes, Nadav Topaz, Namrata Kalsi, Petar Penev,

More information

Protein bioinforma-cs. Åsa Björklund CMB/LICR

Protein bioinforma-cs. Åsa Björklund CMB/LICR Protein bioinforma-cs Åsa Björklund CMB/LICR asa.bjorklund@licr.ki.se In this lecture Protein structures and 3D structure predic-on Protein domains HMMs Protein networks Protein func-on annota-on / predic-on

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

Gene function annotation

Gene function annotation Gene function annotation Paul D. Thomas, Ph.D. University of Southern California What is function annotation? The formal answer to the question: what does this gene do? The association between: a description

More information

Protein structure alignments

Protein structure alignments Protein structure alignments Proteins that fold in the same way, i.e. have the same fold are often homologs. Structure evolves slower than sequence Sequence is less conserved than structure If BLAST gives

More information

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,

More information

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.

More information

- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster.

- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster. NCBI BLAST Services DELTA-BLAST BLAST (http://blast.ncbi.nlm.nih.gov/), Basic Local Alignment Search tool, is a suite of programs for finding similarities between biological sequences. DELTA-BLAST is a

More information

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018 DATA ACQUISITION FROM BIO-DATABASES AND BLAST Natapol Pornputtapong 18 January 2018 DATABASE Collections of data To share multi-user interface To prevent data loss To make sure to get the right things

More information

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

Hands-On Nine The PAX6 Gene and Protein

Hands-On Nine The PAX6 Gene and Protein Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.

More information

Bioinformatics and BLAST

Bioinformatics and BLAST Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists

More information

BIOINFORMATICS: An Introduction

BIOINFORMATICS: An Introduction BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and

More information

Lecture 2. The Blast2GO annotation framework

Lecture 2. The Blast2GO annotation framework Lecture 2 The Blast2GO annotation framework Annotation steps Modulation of annotation intensity Export/Import Functions Sequence Selection Additional Tools Functional assignment Annotation Transference

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

Functional Annotation & Comparative Genomics. Lu Wang, Georgia Tech

Functional Annotation & Comparative Genomics. Lu Wang, Georgia Tech Functional Annotation & Comparative Genomics Lu Wang, Georgia Tech Outline Functional annotation What is functional annotation? What needs to be annotated Approaches to functional annotation Pros/cons

More information

A Protein Ontology from Large-scale Textmining?

A Protein Ontology from Large-scale Textmining? A Protein Ontology from Large-scale Textmining? Protege-Workshop Manchester, 07-07-2003 Kai Kumpf, Juliane Fluck and Martin Hofmann Instructive mistakes: a narrative Aim: Protein ontology that supports

More information

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like SCOP all-β class 4-helical cytokines T4 endonuclease V all-α class, 3 different folds Globin-like TIM-barrel fold α/β class Profilin-like fold α+β class http://scop.mrc-lmb.cam.ac.uk/scop CATH Class, Architecture,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology

More information

Introduction to Bioinformatics Online Course: IBT

Introduction to Bioinformatics Online Course: IBT Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple

More information

Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations

Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations Sequence Analysis and Structure Prediction Service Centro Nacional de Biotecnología CSIC 8-10 May, 2013 Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations Course Notes Instructor:

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

EBI web resources II: Ensembl and InterPro

EBI web resources II: Ensembl and InterPro EBI web resources II: Ensembl and InterPro Yanbin Yin Fall 2015 h.p://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to h.p://www.ebi.ac.uk/interpro/training.html and finish the second online training

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

Update on human genome completion and annotations: Protein information resource

Update on human genome completion and annotations: Protein information resource UPDATE ON GENOME COMPLETION AND ANNOTATIONS Update on human genome completion and annotations: Protein information resource Cathy Wu 1 and Daniel W. Nebert 2 * 1 Director of PIR, Department of Biochemistry

More information

Gene Ontology and overrepresentation analysis

Gene Ontology and overrepresentation analysis Gene Ontology and overrepresentation analysis Kjell Petersen J Express Microarray analysis course Oslo December 2009 Presentation adapted from Endre Anderssen and Vidar Beisvåg NMC Trondheim Overview How

More information

The EcoCyc Database. January 25, de Nitrógeno, UNAM,Cuernavaca, A.P. 565-A, Morelos, 62100, Mexico;

The EcoCyc Database. January 25, de Nitrógeno, UNAM,Cuernavaca, A.P. 565-A, Morelos, 62100, Mexico; The EcoCyc Database Peter D. Karp, Monica Riley, Milton Saier,IanT.Paulsen +, Julio Collado-Vides + Suzanne M. Paley, Alida Pellegrini-Toole,César Bonavides ++, and Socorro Gama-Castro ++ January 25, 2002

More information

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem University of Groningen Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's

More information

Integration of functional genomics data

Integration of functional genomics data Integration of functional genomics data Laboratoire Bordelais de Recherche en Informatique (UMR) Centre de Bioinformatique de Bordeaux (Plateforme) Rennes Oct. 2006 1 Observations and motivations Genomics

More information

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)

More information

Structure to Function. Molecular Bioinformatics, X3, 2006

Structure to Function. Molecular Bioinformatics, X3, 2006 Structure to Function Molecular Bioinformatics, X3, 2006 Structural GeNOMICS Structural Genomics project aims at determination of 3D structures of all proteins: - organize known proteins into families

More information

Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

More information

Protein Families. João C. Setubal University of São Paulo Agosto /23/2012 J. C. Setubal

Protein Families. João C. Setubal University of São Paulo Agosto /23/2012 J. C. Setubal Protein Families João C. Setubal University of São Paulo Agosto 2012 8/23/2012 J. C. Setubal 1 Motivation Phytophthora Science paper [Tyler et al., 2006] Comparison of the [P. sojae and P. ramorum] genomes

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Introduction to Evolutionary Concepts

Introduction to Evolutionary Concepts Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq

More information

Transitioning BioCyc to a Subscription Model

Transitioning BioCyc to a Subscription Model Transitioning BioCyc to a Subscription Model Peter D. Karp SRI International ecocyc.org biocyc.org metacyc.org BioCyc.org Collection of 9,300 Pathway/Genome Databases Pathway/Genome Database (PGDB) combines

More information

Bio2. Heuristics, Databases ; Multiple Sequence Alignment ; Gene Finding. Biological Databases (sequences) Armstrong, 2007 Bioinformatics 2

Bio2. Heuristics, Databases ; Multiple Sequence Alignment ; Gene Finding. Biological Databases (sequences) Armstrong, 2007 Bioinformatics 2 Bio2 Heuristics, Databases ; Multiple Sequence Alignment ; Gene Finding Biological Databases (sequences) 1 Biological Databases Introduction to Sequence Databases Overview of primary query tools and the

More information

Prediction of protein function from sequence analysis

Prediction of protein function from sequence analysis Prediction of protein function from sequence analysis Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy The omic era Genome Sequencing Projects: Archaea: 74 species In Progress:52 Bacteria:

More information

Supplementary Materials for mplr-loc Web-server

Supplementary Materials for mplr-loc Web-server Supplementary Materials for mplr-loc Web-server Shibiao Wan and Man-Wai Mak email: shibiao.wan@connect.polyu.hk, enmwmak@polyu.edu.hk June 2014 Back to mplr-loc Server Contents 1 Introduction to mplr-loc

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Practical considerations of working with sequencing data

Practical considerations of working with sequencing data Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!

More information

Supplementary Information

Supplementary Information Supplementary Information Supplementary Figure 1. Schematic pipeline for single-cell genome assembly, cleaning and annotation. a. The assembly process was optimized to account for multiple cells putatively

More information

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 2 Amino Acid Structures from Klug & Cummings

More information

Hidden Markov Models (HMMs) and Profiles

Hidden Markov Models (HMMs) and Profiles Hidden Markov Models (HMMs) and Profiles Swiss Institute of Bioinformatics (SIB) 26-30 November 2001 Markov Chain Models A Markov Chain Model is a succession of states S i (i = 0, 1,...) connected by transitions.

More information

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17. Genetic Variation: The genetic substrate for natural selection What about organisms that do not have sexual reproduction? Horizontal Gene Transfer Dr. Carol E. Lee, University of Wisconsin In prokaryotes:

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

MiGA: The Microbial Genome Atlas

MiGA: The Microbial Genome Atlas December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization. 3.B.1 Gene Regulation Gene regulation results in differential gene expression, leading to cell specialization. We will focus on gene regulation in prokaryotes first. Gene regulation accounts for some of

More information

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

More information

In-Silico Approach for Hypothetical Protein Function Prediction

In-Silico Approach for Hypothetical Protein Function Prediction In-Silico Approach for Hypothetical Protein Function Prediction Shabanam Khatoon Department of Computer Science, Faculty of Natural Sciences Jamia Millia Islamia, New Delhi Suraiya Jabin Department of

More information

Riboflavin Metabolism: A study to see if Mrub_1256 is Orthologous to E. coli b0415, and if Mrub_1254 is Orthologous to E.

Riboflavin Metabolism: A study to see if Mrub_1256 is Orthologous to E. coli b0415, and if Mrub_1254 is Orthologous to E. Augustana College Augustana Digital Commons Meiothermus ruber Genome Analysis Project Biology Winter 2-2016 Riboflavin Metabolism: A study to see if Mrub_1256 is Orthologous to E. coli b0415, and if Mrub_1254

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Gene Regulation and Expression

Gene Regulation and Expression THINK ABOUT IT Think of a library filled with how-to books. Would you ever need to use all of those books at the same time? Of course not. Now picture a tiny bacterium that contains more than 4000 genes.

More information

Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource

Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource Sharpton et al. BMC Bioinformatics 2012, 13:264 RESEARCH ARTICLE Open Access Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource

More information

86 Part 4 SUMMARY INTRODUCTION

86 Part 4 SUMMARY INTRODUCTION 86 Part 4 Chapter # AN INTEGRATION OF THE DESCRIPTIONS OF GENE NETWORKS AND THEIR MODELS PRESENTED IN SIGMOID (CELLERATOR) AND GENENET Podkolodny N.L. *1, 2, Podkolodnaya N.N. 1, Miginsky D.S. 1, Poplavsky

More information