functional annotation preliminary results

Size: px
Start display at page:

Download "functional annotation preliminary results"

Transcription

1 functional annotation preliminary results March 16, 216 Alicia Francis, Andrew Teng, Chen Guo, Devika Singh, Ellie Kim, Harshmi Shah, James Moore, Jose Jaimes, Nadav Topaz, Namrata Kalsi, Petar Penev, Tannishtha Som

2 overview recap background functional annotation final tools modified workflow

3 overview recap background functional annotation final tools modified workflow

4 Functional Annotation Consists of attaching biological information to genomic elements. Goal is to better understand the function of the genes, and their respective proteins, within the organism. Information generally includes: biochemical function biological function regulatory functions and interactions expression

5 overview recap background functional annotation final tools modified workflow

6 NT Haemophilus influenzae Gram negative bacteria Facultatively anaerobic, requires NAD and hemin to grow First free living organism to have its genome sequenced Virulence and diseases otitis media, sinusitis, conjunctivitis, and exacerbations of chronic obstructive pulmonary disease ear infections in children and bronchitis in adults, but may also cause invasive disease, such as bacteremia and pneumonia. Adaptability and transformation horizontal gene transfer Out of 182 genes, 1699 are CDS Source:

7 overview recap background functional annotation final tools modified workflow

8 Protein and NonCoding Functions Protein Function: Structural Regulatory Transmembrane Receptor Enzyme Virulence Factors Metabolic Processes NonCoding Functions: CRISPRs Operons

9 Functional Assignment Name Gene Symbol Protein Name Role Function of the protein in the cell Associated Information Supporting Evidence: Domains/Motifs Transmembrane Regions Orthologous domains Pathways

10 Levels of Annotation

11 Levels of Annotation 1

12 CRISPR Clustered regularly interspaced short palindromic repeats (CRISPRs) highly conserved short sequences (24 bps) separated by spacers of a similar length. Part of acquired immunity in prokaryotes Resistance to bacteriophage invasion Present in ~4% of bacterial genomes Used for subtyping through analysis of spacers with a high degree of polymorphism Salmonella (Fabre et al.) spacer content was strongly correlated with both serotype and multilocus sequence typing (MLST) type Mycobacterium tuberculosis (Gori et al.) Spoligotyping (PCR) amplification of a highly polymorphic direct repeat locus

13 CRISPR: Pilercr and CRT Consensus Results M5964 M27986 M28745 M29197 M294 M36564 M7572 M27987 M2877 M2922 M29658 M3658 M154 M28356 M2881 M29227 M29684 M36582 M1618 M2845 M28853 M2937 M29695 M M2626 M28687 M28888 M29323 M29697 M3666 M2632 M2872 M29179 M29331 M36557 M37982

14 CRISPR: Pilercr Command line: pilercr in <input fasta> out <text file> seq <consensus sequence>

15 CRISPR: CRT Command line: java cp CRT1.2CLI.jar crt <input fasta> <output file>

16 Domains & Motifs Structure and localization fundamental for function Domains selfcontained cooperative folding units Motifs short consensus regions, crucial for the function Using homology Same sequence same function however: Higher sequence similarity = higher probability of same function Using abinitio methods Source

17 InterProScan Scans through the InterPro databases and provides annotation based on homology and gene ontology terms. Databases: Prosite, Coils, PIRSF, Pfam, ProDom, Superfamily, Gene3d, SMART, TIGRFAM, PRINTS Command:./interproscan.sh appl (applications to include) i (input file) Output: TSV, XML, GFF3

18 InterProScan Results: Sample Total # Genes InterPro Unique Annotations % Annotated M % M % M % M % M % M % M %

19 InterProScan Results: Sample Coils Gene3D Pfam PIRSF PRINTS ProDom PSPatterns PSProfiles SMART SF TF M M M M M M M SF = Superfamilies, TF = TIGRFAM

20 Gramnegative bacteria Secretory Pathway Secretion : Transport of proteins, enzymes, toxins from interior of bacterial cell to its exterior. 6 types in gramnegative bacteria : Type II and V : proteins carry signal peptides Type I, III, IV and VI : proteins do not carry signal peptides Proteins in the bacterial membrane also use this pathway. Membrane proteins : Lipoproteins : Functions as virulence factors, nutrient uptake, adhesion Transmembrane proteins Signal Peptidase I : Cleavage of preproteins translocated across membranes. Signal Peptidase II : Cleavage of bacterial prolipoproteins.

21 Transmembrane Proteins alphahelical betabarrels Topology Orientation of Nterminus single/multipass Function Transmembrane helices have longer hydrophobic region and no cleavage site Signal Peptides Source

22 LipoP Based on Hidden Markov Model Classifies genes into four classes: SpI: signal peptide (signal peptidase I) TMH: nterminal transmembrane helix SpII: lipoprotein signal peptide (signal peptidase II) CYT: cytoplasmic. It really just means all the rest Command: LipoP short [Input.fasta] > [Output.gff] Example Output: (short summarizes the best prediction for each gene)

23 LipoP Results Sample No. SpI SpII TMH CYT Total M M M M M M M

24 SignalP Predicts signal peptide and cleavage site Based on neural network Command: signalp t <type of organism> f <format> [Input.faa] > <outputfile> Example Output: (short summarizes the best prediction for each gene)

25 SignalP Results Sample No. Number of signal peptides Total % Signal Peptide M M M M M M M

26 LipoP vs. SignalP Sample No. LipoP Unique SignalP Unique Common NonSignal Total M M M M M M M

27 LipoP vs. SignalP LipoP clearly provides more unique information than SignalP for signal peptides.

28 TMHMM Based on Hidden Markov Model approach Prediction of transmembrane helices in proteins Command: cat <input faa file> /path/tmhmm short > <output file> Sample output:

29 TMHMM Results Sample No. Number of TM Proteins Total % TMH M M M M M M M

30 LipoP vs. TMHMM Sample No. LipoP Unique TMHMM Unique Common Total M M M M M M M

31 LipoP vs. TMHMM TMHMM predicts more transmembrane protein than LipoP Could be because the hydrophobic region of a signal peptide is mistaken as that of a TM protein. Next Step : Run Phobius which claims to have lesser false positive rates than the two.

32 VFDB Virulence Factors Virulence Factors: molecules produced by bacterial pathogens contributing to: Pathogenicity of the host Enabling them to achieve colonization Immunoevasion Immunosuppression Entrance to the cell Obtaining host nutrients Useful for understanding virulence mechanisms and interactions with host cell

33 VFDB Output Identification of protein sequences Command line: blastp db <DatabaseName> query <InputFile> outfmt "6 stitle qseqid sseqid sgi qcovs evalue" out <OutputFile> Tabular Output:

34 Levels of Annotation 2

35 Operons and Polycistronic mrna: Operons: Cluster of genes which are under the control of a single promoter Transcribed together into a single mrna strand Polycistronic mrna: A single mrna strand coding for many proteins

36 Operons OperonDB Input :.faa and.ptt files Blastp createoperondb.pl.ptt format is no longer used by NCBI. Alternative approach: Download operons from OperonDB and DOORS2 for our species of interest Blast it against our query sequence Compare results

37 Levels of Annotation 3

38 Pathways What is a biological pathway? Biological pathway diagrams are used to describe the biological reactions and interaction in a cell in a graphical way. There are many types of biological pathways but most wellknown are pathways involved in Metabolic pathways, generegulation pathways and signal transduction pathway. metabolic pathway: chemical reactions that occur in our bodies. generegulation pathway: turn genes on and off signal transduction pathways: move a signal from a cell s exterior to its interior. Why is it important? The computational approach for incorporating pathway knowledge to interpret highthroughput datasets plays a key role in understanding diseases mechanism from genetic studies. It helps many scientists to generate biologically meaningful hypotheses and it allows more comprehensive inferences made based on the pathway analysis.

39 Work in Progress Large (very) database Overlapping methods with other selected tools mysql issues Complicated user manual Large database (3gb+) Will install separately on home folder and run it there

40 Updated Pipeline

41 Exercise

42 Questions?

43 References Philip Jones, David Binns, HsinYu Chang, Matthew Fraser, Weizhong Li, Craig McAnulla, Hamish McWilliam, John Maslen, Alex Mitchell, Gift Nuka, Sebastien Pesseat, Antony F. Quinn, Amaia SangradorVegas, Maxim Scheremetjew, SiewYit Yong, Rodrigo Lopez, and Sarah Hunter (214). InterProScan 5: genomescale protein function classification. Bioinformatics, Jan 214; doi: 1.193/bioinformatics/btu31 Krogh, A., et al. (21). "Predicting transmembrane protein topology with a hidden markov model: application to complete genomes." Journal of Molecular Biology 35(3): G.E. Tusnady and I. Simon (1998), Principles Governing Amino Acid Composition of Integral Membrane Proteins: Applications to topology prediction, J. Mol. Biol. 283, Panwar, B., et al. (214). Prediction and classification of ncrnas using structural information. BMC Genomics 15:127. DOI: / Thomas Nordahl Petersen, Soren Brunak, Gunnar von Heijne & Henrik Nielsen. SignalP4.: discriminating signal peptides from transmembrane regions. Nature Methods, 8:785786, 211 Chen L, Xionq Z, Sun L, Yang J, Jin Q. VFDB 212 update: toward the genetic diversity and molecular evolution of bacterial virulence factors. Nucleic Acids Res. 212 Jan;4(Database issue):d6415. doi: 1.193/nar/gkr989. Epub 211 Nov 8. Koskinen, Patrik, et al. "PANNZER: highthroughput functional annotation of uncharacterized proteins in an errorprone environment." Bioinformatics 31.1 (215): PANNZER: Edgar, Robert C. "PILERCR: fast and accurate identification of CRISPR repeats." Bmc Bioinformatics 8.1 (27): 1. PILERCR: Mi H, Lazarevaulitsky B, Loo R, et al. The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res. 25;33(Database issue):d2848. Thomas PD, Campbell MJ, Kejariwal A, et al. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 23;13(9): Mi H, Poudel S, Muruganujan A, Casagrande JT, Thomas PD. PANTHER version 1: expanded protein families and functions, and analysis tools. Nucleic Acids Res. 216;44(D1):D33642.

44 References Van Eldere, Johan et al. Nontypeable Haemophilus influenzae, an underrecognised pathogen. The Lancet Infectious Diseases, Volume 14, Issue 12, Marraffini, Luciano A. "CrisprCas Immunity in Prokaryotes." Nature (215): Fabre, Laëtitia et al. CRISPR Typing and Subtyping for Improved Laboratory Surveillance of Salmonella Infections. Ed. Igor Mokrousov. PLoS ONE 7.5 (212): e PMC. Web. 16 Mar. 216.

FUNCTION ANNOTATION PRELIMINARY RESULTS

FUNCTION ANNOTATION PRELIMINARY RESULTS FUNCTION ANNOTATION PRELIMINARY RESULTS FACTION I KAI YUAN KALYANI PATANKAR KIERA BERGER CAMILA MEDRANO HUBERT PAN JUNKE WANG YANXI CHEN AJAY RAMAKRISHNAN MRUNAL DEHANKAR OVERVIEW Introduction Previous

More information

-max_target_seqs: maximum number of targets to report

-max_target_seqs: maximum number of targets to report Review of exercise 1 tblastn -num_threads 2 -db contig -query DH10B.fasta -out blastout.xls -evalue 1e-10 -outfmt "6 qseqid sseqid qstart qend sstart send length nident pident evalue" Other options: -max_target_seqs:

More information

We have: We will: Assembled six genomes Made predictions of most likely gene locations. Add a layers of biological meaning to the sequences

We have: We will: Assembled six genomes Made predictions of most likely gene locations. Add a layers of biological meaning to the sequences Recap We have: Assembled six genomes Made predictions of most likely gene locations We will: Add a layers of biological meaning to the sequences Start with Biology This will motivate the choices we make

More information

Functional Annotation

Functional Annotation Functional Annotation Outline Introduction Strategy Pipeline Databases Now, what s next? Functional Annotation Adding the layers of analysis and interpretation necessary to extract its biological significance

More information

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models Last time Domains Hidden Markov Models Today Secondary structure Transmembrane proteins Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL

More information

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure Last time Today Domains Hidden Markov Models Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL SSLGPVVDAHPEYEEVALLERMVIPERVIE FRVPWEDDNGKVHVNTGYRVQFNGAIGPYK

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Genome Annotation Project Presentation

Genome Annotation Project Presentation Halogeometricum borinquense Genome Annotation Project Presentation Loci Hbor_05620 & Hbor_05470 Presented by: Mohammad Reza Najaf Tomaraei Hbor_05620 Basic Information DNA Coordinates: 527,512 528,261

More information

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013 EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice

More information

Public Database 의이용 (1) - SignalP (version 4.1)

Public Database 의이용 (1) - SignalP (version 4.1) Public Database 의이용 (1) - SignalP (version 4.1) 2015. 8. KIST 이철주 Secretion pathway prediction ProteinCenter (Proxeon Bioinformatics, Odense, Denmark; http://www.cbs.dtu.dk/services) SignalP (version 4.1)

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Bioinformatics methods COMPUTATIONAL WORKFLOW

Bioinformatics methods COMPUTATIONAL WORKFLOW Bioinformatics methods COMPUTATIONAL WORKFLOW RAW READ PROCESSING: 1. FastQC on raw reads 2. Kraken on raw reads to ID and remove contaminants 3. SortmeRNA to filter out rrna 4. Trimmomatic to filter by

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative genomics: Overview & Tools + MUMmer algorithm Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first

More information

EBI web resources II: Ensembl and InterPro

EBI web resources II: Ensembl and InterPro EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course

More information

Functional Annotation & Comparative Genomics. Lu Wang, Georgia Tech

Functional Annotation & Comparative Genomics. Lu Wang, Georgia Tech Functional Annotation & Comparative Genomics Lu Wang, Georgia Tech Outline Functional annotation What is functional annotation? What needs to be annotated Approaches to functional annotation Pros/cons

More information

PROTEIN SUBCELLULAR LOCALIZATION PREDICTION BASED ON COMPARTMENT-SPECIFIC BIOLOGICAL FEATURES

PROTEIN SUBCELLULAR LOCALIZATION PREDICTION BASED ON COMPARTMENT-SPECIFIC BIOLOGICAL FEATURES 3251 PROTEIN SUBCELLULAR LOCALIZATION PREDICTION BASED ON COMPARTMENT-SPECIFIC BIOLOGICAL FEATURES Chia-Yu Su 1,2, Allan Lo 1,3, Hua-Sheng Chiu 4, Ting-Yi Sung 4, Wen-Lian Hsu 4,* 1 Bioinformatics Program,

More information

Meiothermus ruber Genome Analysis Project

Meiothermus ruber Genome Analysis Project Augustana College Augustana Digital Commons Meiothermus ruber Genome Analysis Project Biology 2018 Predicted ortholog pairs between E. coli and M. ruber are b3456 and mrub_2379, b3457 and mrub_2378, b3456

More information

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like SCOP all-β class 4-helical cytokines T4 endonuclease V all-α class, 3 different folds Globin-like TIM-barrel fold α/β class Profilin-like fold α+β class http://scop.mrc-lmb.cam.ac.uk/scop CATH Class, Architecture,

More information

Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

More information

TMHMM2.0 User's guide

TMHMM2.0 User's guide TMHMM2.0 User's guide This program is for prediction of transmembrane helices in proteins. July 2001: TMHMM has been rated best in an independent comparison of programs for prediction of TM helices: S.

More information

Protein structure alignments

Protein structure alignments Protein structure alignments Proteins that fold in the same way, i.e. have the same fold are often homologs. Structure evolves slower than sequence Sequence is less conserved than structure If BLAST gives

More information

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

1-D Predictions. Prediction of local features: Secondary structure & surface exposure 1-D Predictions Prediction of local features: Secondary structure & surface exposure 1 Learning Objectives After today s session you should be able to: Explain the meaning and usage of the following local

More information

Gene function annotation

Gene function annotation Gene function annotation Paul D. Thomas, Ph.D. University of Southern California What is function annotation? The formal answer to the question: what does this gene do? The association between: a description

More information

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome Dr. Dirk Gevers 1,2 1 Laboratorium voor Microbiologie 2 Bioinformatics & Evolutionary Genomics The bacterial species in the genomic era CTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAACATGTTATTCAG GTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAAATTATGTTTCCCATGCATCAGG

More information

Regulation of gene expression. Premedical - Biology

Regulation of gene expression. Premedical - Biology Regulation of gene expression Premedical - Biology Regulation of gene expression in prokaryotic cell Operon units system of negative feedback positive and negative regulation in eukaryotic cell - at any

More information

Prediction of signal peptides and signal anchors by a hidden Markov model

Prediction of signal peptides and signal anchors by a hidden Markov model In J. Glasgow et al., eds., Proc. Sixth Int. Conf. on Intelligent Systems for Molecular Biology, 122-13. AAAI Press, 1998. 1 Prediction of signal peptides and signal anchors by a hidden Markov model Henrik

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Homology. and. Information Gathering and Domain Annotation for Proteins

Homology. and. Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology

More information

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=

More information

A NEURAL NETWORK METHOD FOR IDENTIFICATION OF PROKARYOTIC AND EUKARYOTIC SIGNAL PEPTIDES AND PREDICTION OF THEIR CLEAVAGE SITES

A NEURAL NETWORK METHOD FOR IDENTIFICATION OF PROKARYOTIC AND EUKARYOTIC SIGNAL PEPTIDES AND PREDICTION OF THEIR CLEAVAGE SITES International Journal of Neural Systems, Vol. 8, Nos. 5 & 6 (October/December, 1997) 581 599 c World Scientific Publishing Company A NEURAL NETWORK METHOD FOR IDENTIFICATION OF PROKARYOTIC AND EUKARYOTIC

More information

Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space

Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space Published online February 15, 26 166 18 Nucleic Acids Research, 26, Vol. 34, No. 3 doi:1.193/nar/gkj494 Comprehensive genome analysis of 23 genomes provides structural genomics with new insights into protein

More information

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem University of Groningen Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

Meiothermus ruber Genome Analysis Project

Meiothermus ruber Genome Analysis Project Augustana College Augustana Digital Commons Meiothermus ruber Genome Analysis Project Biology 2018 Examination of Orthologous Genes (Mrub_2518 and b3728, Mrub_2519 and b3727, Mrub_2520 and b3726, Mrub_2521

More information

Introduction to Bioinformatics Online Course: IBT

Introduction to Bioinformatics Online Course: IBT Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple

More information

In silico analysis of subcellular localization of putative proteins of Mycobacterium tuberculosis H37Rv strain

In silico analysis of subcellular localization of putative proteins of Mycobacterium tuberculosis H37Rv strain ISPUB.COM The Internet Journal of Health Volume 7 Number 1 In silico analysis of subcellular localization of putative proteins of Mycobacterium tuberculosis H37Rv P Somvanshi, V Singh, P Seth Citation

More information

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapter 15 Active Reading Guide Regulation of Gene Expression Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,

More information

Bio 119 Bacterial Genomics 6/26/10

Bio 119 Bacterial Genomics 6/26/10 BACTERIAL GENOMICS Reading in BOM-12: Sec. 11.1 Genetic Map of the E. coli Chromosome p. 279 Sec. 13.2 Prokaryotic Genomes: Sizes and ORF Contents p. 344 Sec. 13.3 Prokaryotic Genomes: Bioinformatic Analysis

More information

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Motifs, Profiles and Domains Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Comparing Two Proteins Sequence Alignment Determining the pattern of evolution and identifying conserved

More information

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17. Genetic Variation: The genetic substrate for natural selection What about organisms that do not have sexual reproduction? Horizontal Gene Transfer Dr. Carol E. Lee, University of Wisconsin In prokaryotes:

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

Some Problems from Enzyme Families

Some Problems from Enzyme Families Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems

More information

Introduction to Microbiology BIOL 220 Summer Session I, 1996 Exam # 1

Introduction to Microbiology BIOL 220 Summer Session I, 1996 Exam # 1 Name I. Multiple Choice (1 point each) Introduction to Microbiology BIOL 220 Summer Session I, 1996 Exam # 1 B 1. Which is possessed by eukaryotes but not by prokaryotes? A. Cell wall B. Distinct nucleus

More information

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007 -2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Microbiology BIOL 202 Lecture Course Outcome Guide (COG) Approved 22 MARCH 2012 Pg.1

Microbiology BIOL 202 Lecture Course Outcome Guide (COG) Approved 22 MARCH 2012 Pg.1 Microbiology BIOL 202 Lecture Course Outcome Guide (COG) Approved 22 MARCH 2012 Pg.1 Course: Credits: 3 Instructor: Course Description: Concepts and Issues 1. Microbial Ecology including mineral cycles.

More information

CRISPR-SeroSeq: A Developing Technique for Salmonella Subtyping

CRISPR-SeroSeq: A Developing Technique for Salmonella Subtyping Department of Biological Sciences Seminar Blog Seminar Date: 3/23/18 Speaker: Dr. Nikki Shariat, Gettysburg College Title: Probing Salmonella population diversity using CRISPRs CRISPR-SeroSeq: A Developing

More information

Signal peptides and protein localization prediction

Signal peptides and protein localization prediction Downloaded from orbit.dtu.dk on: Jun 30, 2018 Signal peptides and protein localization prediction Nielsen, Henrik Published in: Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics Publication

More information

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Yeast ORFan Gene Project: Module 5 Guide

Yeast ORFan Gene Project: Module 5 Guide Cellular Localization Data (Part 1) The tools described below will help you predict where your gene s product is most likely to be found in the cell, based on its sequence patterns. Each tool adds an additional

More information

HAEMOPHILUS MODULE 29.1 INTRODUCTION OBJECTIVES 29.2 MORPHOLOGY. Notes

HAEMOPHILUS MODULE 29.1 INTRODUCTION OBJECTIVES 29.2 MORPHOLOGY. Notes 29 HAEMOPHILUS 29.1 INTRODUCTION The genus Haemophilus contains small, nonmotile, nonsporing, oxidase positive, pleomorphic, gram negative bacilli that are parasitic on human beings or animals. Haemophilus

More information

Gene Ontology and overrepresentation analysis

Gene Ontology and overrepresentation analysis Gene Ontology and overrepresentation analysis Kjell Petersen J Express Microarray analysis course Oslo December 2009 Presentation adapted from Endre Anderssen and Vidar Beisvåg NMC Trondheim Overview How

More information

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.

More information

A genomic insight into evolution and virulence of Corynebacterium diphtheriae

A genomic insight into evolution and virulence of Corynebacterium diphtheriae A genomic insight into evolution and virulence of Corynebacterium diphtheriae Vartul Sangal, Ph.D. Northumbria University, Newcastle vartul.sangal@northumbria.ac.uk @VartulSangal Newcastle University 8

More information

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus L3.1: Circuits: Introduction to Transcription Networks Cellular Design Principles Prof. Jenna Rickus In this lecture Cognitive problem of the Cell Introduce transcription networks Key processing network

More information

Hands-On Nine The PAX6 Gene and Protein

Hands-On Nine The PAX6 Gene and Protein Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available

More information

This document describes the process by which operons are predicted for genes within the BioHealthBase database.

This document describes the process by which operons are predicted for genes within the BioHealthBase database. 1. Purpose This document describes the process by which operons are predicted for genes within the BioHealthBase database. 2. Methods Description An operon is a coexpressed set of genes, transcribed onto

More information

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel Christian Sigrist General Definition on Conserved Regions Conserved regions in proteins can be classified into 5 different groups: Domains: specific combination of secondary structures organized into a

More information

Comparative Genomics Background & Strategy. Faction 2

Comparative Genomics Background & Strategy. Faction 2 Comparative Genomics Background & Strategy Faction 2 Overview Introduction to comparative genomics Salmonella enterica subsp. enterica serovar Heidelberg Comparative Genomics Faction 2 Objectives Genomic

More information

The EcoCyc Database. January 25, de Nitrógeno, UNAM,Cuernavaca, A.P. 565-A, Morelos, 62100, Mexico;

The EcoCyc Database. January 25, de Nitrógeno, UNAM,Cuernavaca, A.P. 565-A, Morelos, 62100, Mexico; The EcoCyc Database Peter D. Karp, Monica Riley, Milton Saier,IanT.Paulsen +, Julio Collado-Vides + Suzanne M. Paley, Alida Pellegrini-Toole,César Bonavides ++, and Socorro Gama-Castro ++ January 25, 2002

More information

BIOINFORMATICS: An Introduction

BIOINFORMATICS: An Introduction BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Systems biology Introduction to Bioinformatics Systems biology: modeling biological p Study of whole biological systems p Wholeness : Organization of dynamic interactions Different behaviour of the individual

More information

Biology 112 Practice Midterm Questions

Biology 112 Practice Midterm Questions Biology 112 Practice Midterm Questions 1. Identify which statement is true or false I. Bacterial cell walls prevent osmotic lysis II. All bacterial cell walls contain an LPS layer III. In a Gram stain,

More information

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species 02-710 Computational Genomics Reconstructing dynamic regulatory networks in multiple species Methods for reconstructing networks in cells CRH1 SLT2 SLR3 YPS3 YPS1 Amit et al Science 2009 Pe er et al Recomb

More information

GEP Annotation Report

GEP Annotation Report GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:

More information

BMD645. Integration of Omics

BMD645. Integration of Omics BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study

More information

A Machine Text-Inspired Machine Learning Approach for Identification of Transmembrane Helix Boundaries

A Machine Text-Inspired Machine Learning Approach for Identification of Transmembrane Helix Boundaries A Machine Text-Inspired Machine Learning Approach for Identification of Transmembrane Helix Boundaries Betty Yee Man Cheng 1, Jaime G. Carbonell 1, and Judith Klein-Seetharaman 1, 2 1 Language Technologies

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein

More information

Revisiting the Central Dogma The role of Small RNA in Bacteria

Revisiting the Central Dogma The role of Small RNA in Bacteria Graduate Student Seminar Revisiting the Central Dogma The role of Small RNA in Bacteria The Chinese University of Hong Kong Supervisor : Prof. Margaret Ip Faculty of Medicine Student : Helen Ma (PhD student)

More information

Update on human genome completion and annotations: Protein information resource

Update on human genome completion and annotations: Protein information resource UPDATE ON GENOME COMPLETION AND ANNOTATIONS Update on human genome completion and annotations: Protein information resource Cathy Wu 1 and Daniel W. Nebert 2 * 1 Director of PIR, Department of Biochemistry

More information

Gene Control Mechanisms at Transcription and Translation Levels

Gene Control Mechanisms at Transcription and Translation Levels Gene Control Mechanisms at Transcription and Translation Levels Dr. M. Vijayalakshmi School of Chemical and Biotechnology SASTRA University Joint Initiative of IITs and IISc Funded by MHRD Page 1 of 9

More information

RGP finder: prediction of Genomic Islands

RGP finder: prediction of Genomic Islands Training courses on MicroScope platform RGP finder: prediction of Genomic Islands Dynamics of bacterial genomes Gene gain Horizontal gene transfer Gene loss Deletion of one or several genes Duplication

More information

Prokaryotic Gene Expression (Learning Objectives)

Prokaryotic Gene Expression (Learning Objectives) Prokaryotic Gene Expression (Learning Objectives) 1. Learn how bacteria respond to changes of metabolites in their environment: short-term and longer-term. 2. Compare and contrast transcriptional control

More information

Title: PSORTb v.2.0: Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis

Title: PSORTb v.2.0: Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis Title: PSORTb v.2.0: Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis Running head: PSORTb Localization Prediction J.L. Gardy, Department

More information

Riboflavin Metabolism: A study to see if Mrub_1256 is Orthologous to E. coli b0415, and if Mrub_1254 is Orthologous to E.

Riboflavin Metabolism: A study to see if Mrub_1256 is Orthologous to E. coli b0415, and if Mrub_1254 is Orthologous to E. Augustana College Augustana Digital Commons Meiothermus ruber Genome Analysis Project Biology Winter 2-2016 Riboflavin Metabolism: A study to see if Mrub_1256 is Orthologous to E. coli b0415, and if Mrub_1254

More information

Markov Models & DNA Sequence Evolution

Markov Models & DNA Sequence Evolution 7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis

Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis Title Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis Author list Yu Han 1, Huihua Wan 1, Tangren Cheng 1, Jia Wang 1, Weiru Yang 1, Huitang Pan 1* & Qixiang

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University Genome Annotation Qi Sun Bioinformatics Facility Cornell University Some basic bioinformatics tools BLAST PSI-BLAST - Position-Specific Scoring Matrix HMM - Hidden Markov Model NCBI BLAST How does BLAST

More information

Protein function prediction based on sequence analysis

Protein function prediction based on sequence analysis Performing sequence searches Post-Blast analysis, Using profiles and pattern-matching Protein function prediction based on sequence analysis Slides from a lecture on MOL204 - Applied Bioinformatics 18-Oct-2005

More information

Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource

Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource Sharpton et al. BMC Bioinformatics 2012, 13:264 RESEARCH ARTICLE Open Access Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource

More information

Flow of Genetic Information

Flow of Genetic Information presents Flow of Genetic Information A Montagud E Navarro P Fernández de Córdoba JF Urchueguía Elements Nucleic acid DNA RNA building block structure & organization genome building block types Amino acid

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Improved Prediction of Signal Peptides: SignalP 3.0

Improved Prediction of Signal Peptides: SignalP 3.0 doi:10.1016/j.jmb.2004.05.028 J. Mol. Biol. (2004) 340, 783 795 Improved Prediction of Signal Peptides: SignalP 3.0 Jannick Dyrløv Bendtsen 1, Henrik Nielsen 1, Gunnar von Heijne 2 and Søren Brunak 1 *

More information

Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

More information

Pogil Answer Key Control Of Blood Sugar Levels

Pogil Answer Key Control Of Blood Sugar Levels POGIL ANSWER KEY CONTROL OF BLOOD SUGAR LEVELS PDF - Are you looking for pogil answer key control of blood sugar levels Books? Now, you will be happy that at this time pogil answer key control of blood

More information

Supporting online material

Supporting online material Supporting online material Materials and Methods Target proteins All predicted ORFs in the E. coli genome (1) were downloaded from the Colibri data base (2) (http://genolist.pasteur.fr/colibri/). 737 proteins

More information

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 2 Amino Acid Structures from Klug & Cummings

More information

Outline. I. Methods. II. Preliminary Results. A. Phylogeny Methods B. Whole Genome Methods C. Horizontal Gene Transfer

Outline. I. Methods. II. Preliminary Results. A. Phylogeny Methods B. Whole Genome Methods C. Horizontal Gene Transfer Comparative Genomics Preliminary Results April 4, 2016 Juan Castro, Aroon Chande, Cheng Chen, Evan Clayton, Hector Espitia, Alli Gombolay, Walker Gussler, Ken Lee, Tyrone Lee, Hari Prasanna, Carlos Ruiz,

More information

The human transmembrane proteome

The human transmembrane proteome Dobson et al. Biology Direct (2015) 10:31 DOI 10.1186/s13062-015-0061-x RESEARCH Open Access The human transmembrane proteome László Dobson, István Reményi and Gábor E. Tusnády * Abstract Background: Transmembrane

More information

In-Silico Approach for Hypothetical Protein Function Prediction

In-Silico Approach for Hypothetical Protein Function Prediction In-Silico Approach for Hypothetical Protein Function Prediction Shabanam Khatoon Department of Computer Science, Faculty of Natural Sciences Jamia Millia Islamia, New Delhi Suraiya Jabin Department of

More information

Principles of Cellular Biology

Principles of Cellular Biology Principles of Cellular Biology آشنایی با مبانی اولیه سلول Biologists are interested in objects ranging in size from small molecules to the tallest trees: Cell Basic building blocks of life Understanding

More information

Introduction to Bioinformatics Integrated Science, 11/9/05

Introduction to Bioinformatics Integrated Science, 11/9/05 1 Introduction to Bioinformatics Integrated Science, 11/9/05 Morris Levy Biological Sciences Research: Evolutionary Ecology, Plant- Fungal Pathogen Interactions Coordinator: BIOL 495S/CS490B/STAT490B Introduction

More information

Overview of Research at Bioinformatics Lab

Overview of Research at Bioinformatics Lab Overview of Research at Bioinformatics Lab Li Liao Develop new algorithms and (statistical) learning methods that help solve biological problems > Capable of incorporating domain knowledge > Effective,

More information

From gene to protein. Premedical biology

From gene to protein. Premedical biology From gene to protein Premedical biology Central dogma of Biology, Molecular Biology, Genetics transcription replication reverse transcription translation DNA RNA Protein RNA chemically similar to DNA,

More information