Genome Annotation Project Presentation

Similar documents
Meiothermus ruber Genome Analysis Project

Meiothermus ruber Genome Analysis Project

Yeast ORFan Gene Project: Module 5 Guide

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

Regulation and signaling. Overview. Control of gene expression. Cells need to regulate the amounts of different proteins they express, depending on

Translation Part 2 of Protein Synthesis

Riboflavin Metabolism: A study to see if Mrub_1256 is Orthologous to E. coli b0415, and if Mrub_1254 is Orthologous to E.

S1 Gene ontology (GO) analysis of the network alignment results

Translation and Operons

Biology 112 Practice Midterm Questions

-max_target_seqs: maximum number of targets to report

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Activation of a receptor. Assembly of the complex

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

MOLECULAR CELL BIOLOGY

Patrick: An Introduction to Medicinal Chemistry 5e Chapter 04

CSCE555 Bioinformatics. Protein Function Annotation

Chapter 12: Intracellular sorting

BME 5742 Biosystems Modeling and Control

Introduction to Bioinformatics

Improved Prediction of Signal Peptides: SignalP 3.0

Gene regulation II Biochemistry 302. February 27, 2006

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Advanced Higher Biology. Unit 1- Cells and Proteins 2c) Membrane Proteins

Molecular Biology, Genetic Engineering & Biotechnology Operons ???

Public Database 의이용 (1) - SignalP (version 4.1)

Computational Biology: Basics & Interesting Problems

CHAPTER 3. Cell Structure and Genetic Control. Chapter 3 Outline

Supporting online material

Gene regulation II Biochemistry 302. Bob Kelm February 28, 2005

Name Period The Control of Gene Expression in Prokaryotes Notes

Scale in the biological world

Biochemistry Prokaryotic translation

2 The Proteome. The Proteome 15

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Multiple Choice Review- Eukaryotic Gene Expression

PROTEIN SUBCELLULAR LOCALIZATION PREDICTION BASED ON COMPARTMENT-SPECIFIC BIOLOGICAL FEATURES

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

Topology. 1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores.

-14. -Abdulrahman Al-Hanbali. -Shahd Alqudah. -Dr Ma mon Ahram. 1 P a g e

GENE REGULATION AND PROBLEMS OF DEVELOPMENT

Supplemental Materials

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

GO ID GO term Number of members GO: translation 225 GO: nucleosome 50 GO: calcium ion binding 76 GO: structural

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

Topic 4 - #14 The Lactose Operon

RNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA

Supplementary Information

Hybrid Quorum sensing in Vibrio harveyi- two component signalling

Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16

We have: We will: Assembled six genomes Made predictions of most likely gene locations. Add a layers of biological meaning to the sequences

Any protein that can be labelled by both procedures must be a transmembrane protein.

Molecular Biology (9)

REGULATION OF GENE EXPRESSION. Bacterial Genetics Lac and Trp Operon

32 Gene regulation, continued Lecture Outline 11/21/05

Cellular Transport. 1. Transport to and across the membrane 1a. Transport of small molecules and ions 1b. Transport of proteins

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

Sequence analysis and comparison

Visual pigments. Neuroscience, Biochemistry Dr. Mamoun Ahram Third year, 2019

Host-Pathogen Interaction. PN Sharma Department of Plant Pathology CSK HPKV, Palampur

REVIEW 1: BIOCHEMISTRY UNIT. A. Top 10 If you learned anything from this unit, you should have learned:

Supplementary Table 3. Membrane/Signaling/Neural Genes of the DmSP. FBgn CG5265 acetyltransferase amino acid metabolism

Old FINAL EXAM BIO409/509 NAME. Please number your answers and write them on the attached, lined paper.

AP Biology Gene Regulation and Development Review

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Structure to Function. Molecular Bioinformatics, X3, 2006

What is the central dogma of biology?

RANK. Alternative names. Discovery. Structure. William J. Boyle* SUMMARY BACKGROUND

1. In most cases, genes code for and it is that

Gene Control Mechanisms at Transcription and Translation Levels

Meiothermus ruber Genome Analysis Project

Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis

In-Silico Approach for Hypothetical Protein Function Prediction

Bahnson Biochemistry Cume, April 8, 2006 The Structural Biology of Signal Transduction

Structure of mitochondria

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

Regulation of Gene Expression at the level of Transcription

Supplementary Information 16

E. coli b4226 (ppa) and Mrub_0258 are orthologs; E. coli b2501 (ppk) and Mrub_1198 are orthologs. Brandon Wills

FUNCTION ANNOTATION PRELIMINARY RESULTS

Motif Prediction in Amino Acid Interaction Networks

Biological Process Term Enrichment

Chapter 17. From Gene to Protein. Biology Kevin Dees

Regulation of Gene Expression

Welcome to Class 21!

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University

Signal Transduction Phosphorylation Protein kinases. Misfolding diseases. Protein Engineering Lysozyme variants

Receptors and Ion Channels

Signal Transduction. Dr. Chaidir, Apt

In Genomes, Two Types of Genes

Gene regulation I Biochemistry 302. Bob Kelm February 25, 2005

Review. Membrane proteins. Membrane transport

Initiation of translation in eukaryotic cells:connecting the head and tail

Chapter 12. Genes: Expression and Regulation

MOLECULAR DRUG TARGETS

Videos. Bozeman, transcription and translation: Crashcourse: Transcription and Translation -

Chapter 16 Lecture. Concepts Of Genetics. Tenth Edition. Regulation of Gene Expression in Prokaryotes

Transcription:

Halogeometricum borinquense Genome Annotation Project Presentation Loci Hbor_05620 & Hbor_05470 Presented by: Mohammad Reza Najaf Tomaraei

Hbor_05620 Basic Information DNA Coordinates: 527,512 528,261 Reverse ORF

Hbor_05620 Basic Information DNA Nucleotide Sequence: ATGGCTTCTGACGTGTCCTCCCAAACCAGCGACCTTCCCGCACCCGTTCGGG CGTTCGGGAGCGTTGCTCTCGTCGTCGTCCTCGTCCTCCTCTGTGCCAGCATC TTCGTCTCCTTCGGGACAGCCATCTTCCGAACGGTCGGTATCGACCGCGGGA GCGCGCTTTACATCGCCCTGCGGAGTGGGTCACAGTTCGTTGGCTTCGGCGT CGCCGCCGTCGGCTACCTCACTGTGACTGACCAGTGGGAGTTAGTGTACCGC CGCGTGCCGTCGCTGCGCGATCTGAAGTGGATCACCGCCGGGTTCGGCGTCC TCCTCGTCCTCTATCTGGCCATCAACGTCGGCCTCACCGCGTTGGGAATCGAC AGCGGTGACAGCGCAGTTGCCGCGACAGCGGAGGGCCAGCCTGTGCTGTTG CTGTACTACATCCCGGTGACGCTGCTCCTCGTCGCACCGACGGAGGAACTGG TGTTCCGTGGGGTCGTGCAGGGACTGTTCCGGCGTGAGTACGGCGTCCCGTT CGCTATCGCGGGGTCGAGTCTGACGTTCGCATCGATCCACGCCACCTCCTTTA CCGGTGAGGGGGCGGTCGTCTCGCTCATCGTCGTCCTGATTCTCGGCGGCGT TCTCGGCATCGTCTACGAGAAAAGCGAGAGCCTAGTCGTCCCCGTCGTCGCG CACGGGCTCTACAACACCGTCCAGTTCGCCGCCACCTACGCGATGGCGGTTG GACTTGTAGGGTCAGTGTGA Sequence Length: 750 bases

Hbor_05620 Basic Information Amino acid / Protein Sequence: MASDVSSQTSDLPAPVRAFGSVALVVVLVLLCASIFVSFGTAIFRT VGIDRGSALYIALRSGSQFVGFGVAAVGYLTVTDQWELVYRRVPSL RDLKWITAGFGVLLVLYLAINVGLTALGIDSGDSAVAATAEGQPVLL LYYIPVTLLLVAPTEELVFRGVVQGLFRREYGVPFAIAGSSLTFASIH ATSFTGEGAVVSLIVVLILGGVLGIVYEKSESLVVPVVAHGLYNTVQ FAATYAMAVGLVGSV Sequence Length: 249 amino acids

Hbor_05620 Sequence-based Similarity Data Standard Protein BLAST Top hit: Gene product name: metal-dependent membrane protease Organism: Halosarcina pallida (low-salt archaeon) Alignment length: 249 Score: 276 bits E-value: 2e-89 (2 x 10-89 ) Second hit: Gene product name: CAAX amino terminal protease family protein Organism: Haloferax denitrificans (high-salt archaeon) Alignment length: 245 Score: 200 bits E-value: 1e-59 (1 x 10-59 )

Hbor_05620 Sequence-based Similarity Data Conserved Domain Database Search (CDD) Top hit: COG number: pfam02517 COG name: Abi (CAAX protease self-immunity) E-value: 2.33e-15 (2.33 x 10-15 ) Second hit: COG number: COG1266 COG name: COG1266 (Predicted metal-dependent membrane protease) E-value: 2.73e-11 (2.73 x 10-11 )

Hbor_05620 Sequence-based Similarity Data Sequence Logo (T-Coffee & WebLogo)

Hbor_05620 Sequence-based Similarity Data Sequence Logo (T-Coffee & WebLogo)

Hbor_05620 Cellular Localization Data Gram Stain According to Montalvo-Rodriguez et al. (1998), who first isolated H. borinquense and carried out gram-staining, this archaeon has a negative gram stain (pink), which means that its structure is similar to the following diagram (of a gramnegative bacteria):

Hbor_05620 Cellular Localization Data Transmembrane Helices Hidden Markov Model There were 7 (magic number) transmembrane helices predicted by TMHMM. POSSIBLE N-terminus signal sequence. Transmembrane topology graph:

Hbor_05620 Cellular Localization Data SignalP Predicted that since the discrimination score was not high enough, it is NOT a signal peptide. However, there were some notable score spikes in the signal peptide graph:

Hbor_05620 Cellular Localization Data LipoP Best prediction: Transmembrane helix. No plots made.

Hbor_05620 Cellular Localization Data PSORTb Strongly predicted the subcellular localization of this protein to be in the Cytoplasmic Membrane, with a score of 9.99 (all other scores were zero).

Hbor_05620 Cellular Localization Data Phobius Phobius probability graph (with a flat-line signal peptide probability of 0):

Hbor_05620 Cellular Localization Data Hypothesis This integral protein is located in (through) the cytoplasmic membrane of the cell, with faces at both the inner and outer sides of the membrane.

Hbor_05620 START Codon / Alternative ORF No Shine-Dalgarno regions present within 8-13 base pairs upstream of the current START codon.

Hbor_05620 Annotation (Conclusion) This gene most probably codes for a metaldependent membrane-bound peptidase (intermembrane metalloprotease), which is involved with the release of certain transcription factors. According to Wolfe (2009), similar intermembrane proteases are also found in bacteria and archaea, which play an essential role in the proteolysis (hydrolysis of proteins into smaller polypeptides) of membrane-bound transcription factors needed for controlling expression of certain genes.

Hbor_05620 Annotation (Conclusion) The spike observed in Signal-P could be the region where the protein has a binding site for the activating metal ion ligand (highly likely to be zinc). Once activated, this enzyme probably performs proteolysis (making cleavages) on the protein in its active site, and eventually releases the necessary transcription factors.

Hbor_05620 Annotation (Conclusion) Source: LookForDiagnosis.com

Hbor_05470 Basic Information DNA Coordinates: 510,367 513,576 Reverse ORF

Hbor_05470 Basic Information DNA Nucleotide Sequence: GTGACCGACTCGTACACCGTCTTGGTCGTCGGCACGTTGCCATCCCGGTTCCATACCGAG CGATTCGAGGCGGCGTTCGACGACGCAACGCTCAGGTGGGTCGAACAACCCGAGGGAAATT CGACCGTCTTCGAGGCCACAGACTGTATCCTCGCCACCATGGAAGTCGTCTCGTCGGGGGA TTTCGATCCTGAGGCCGCCGCTGTCCCCGTTTTGCTGATCGGTGACAGAGAGGACAGTATC GCAGAAATCGCACTCTCGACGGATGTCGTAGACTATCTCGCCGTCCGGGGAGTGGATGCGG AGGCGACGTGGTTAGCCAACCGGATGGAGGCGGCCGCTGACTCCTATCGGACGGACAAAA AGCGGGCGCAACTCGACAGACAACAGCGAGCGCTTTCAGATCTCGGCGCGTTCGCGCTCTC CGGGCCGGCGCGAGAGGAAGTGTTCGCCGAAACCGTCGAAATCGTTACCGAAACGCTCGAT GCCGGGCGGGCCGCTCTCCTCCAGTCGCGCCCTGAACACGGTGACCTGTCGATTGTCGCC GCCAAAGGCTGGCCGCAAGTCTACGTCGGCGGCGTCGCCGTCGGACTCGACTCCGGGCCG GGACGGGCACTCACGAACCGAGAGCCAGTCGTCGAAAACGACCTGT [... too long...] GGACGACGACGAACACCTCTATGAGGTCCAGCCGGCGGACACGACGCCGTTCGAGACAGT GTACGCTGGACGTGGCAGACTCCGCGAGATGGTGGCCGAAAACGGCGTCTGTACAGTGTC GCTGACGATTCCTTCGGATGTAAGCGTGCGGTCGGTCGTGGACGCATTCGCCGCAACGTAC TCCCGGACGACGCTCGCTGCTCGTCGAACGCTCACGGAACCGACCGACTCGGTCGGGAGTT TCCGAGCGCGCCTCGACGAAGTGTGGACCGAACGACAGCGAGAAGCAATCTCGGCCGCAC TCCACGGAGGACTGTACGACTGGCCGCGCAAGACATCCGTCTCGACGCTCTCGGAAGCGTT CGATGTCTCCTCGCCGACCTTTCAGTACCACCTCCGAGCCGCAGAGCGCAAACTGATCGAA CTCATTTTAGACTGA Sequence Length: 3210 bases

Hbor_05470 Basic Information Amino acid / Protein Sequence: MTDSYTVLVVGTLPSRFHTERFEAAFDDATLRWVEQPEGNSTVFEATDCILATMEVVSS GDFDPEAAAVPVLLIGDREDSIAEIALSTDVVDYLAVRGVDAEATWLANRMEAAADSYRT DKKRAQLDRQQRALSDLGAFALSGPAREEVFAETVEIVTETLDAGRAALLQSRPEHGDLS IVAAKGWPQVYVGGVAVGLDSGPGRALTNREPVVENDLSSETTELTAHLDAGSELSVVV GGGTEPWGVLTVHSSESGAFDETDARFTENVAALIAAVIERETLRTTLEEMFSRMDQGLI GLDNDWRVTYMNPEAERLLDTAASEVVGTNYWDLFDSDAVKPFRERYEKAVKTGEKVS FESYFPPHDRWYEVEAYPSQAGLSVYFADITDRTEREMELLRYERMVEAADDGVYALDS DQHIVQVNQAFAEMFSREQESLIGMHTTELIDEDTAAESALIQAEAARTGEPKRMEFKAEL PDGTEVWIETHFSAIVDEETDQFVGTVGVARDVTERRHRERSLTMLHERTREMAQADNA DAVVTRTIEGCHSLFDPCRAAFFDYDATTRTLERHPQSDEVDRGRYQSDGVNRDAPVES ENDPCWVAFTEERMVRVEEGTTVQFVPVGQYGVLAVERLSGATIRETDAEMLGLLAATM GELLGSVETKQALRSRDQQLEQQNERLTQLNRINRTVREVTRSVVHATTTEEATARACE RLVEAEPYQFAWLCEAPEEANSDDRVVPMTTTGVEDSYAARLTEAAQTSPFPELLSRVA STGRRAVVNDVLDDPAWEPHRRDALAQGFRSIAVVPAGNDRLLVVHGTRPDTFAGEDG DVLVELGETLGAVIDRLGRTQPILDERQTEVELEIRDDQHFLVRLSTATGETATVTGVVPT AEGDYRTFVRTAAPKNAVRDALPPGTLARELTDEDDDEHLYEVQPADTTPFETVYAGRG RLREMVAENGVCTVSLTIPSDVSVRSVVDAFAATYSRTTLAARRTLTEPTDSVGSFRARL DEVWTERQREAISAALHGGLYDWPRKTSVSTLSEAFDVSSPTFQYHLRAAERKLIELILD Sequence Length: 1069 amino acids

Hbor_05470 Sequence-based Similarity Data Standard Protein BLAST Top hit: Gene product name: pas domain s-box (sensor) Organism: Halosarcina pallida (low-salt archaeon) Alignment length: 1067 Score: 1453 bits E-value: 0 (high quality match) Second hit: Gene product name: light and oxygen sensing histidine kinase (typically transmembrane, signal transduction) Organism: Haloferax prahovense (high-salt archaeon) Alignment length: 1073 Score: 546 bits E-value: 3e-172 (3 x 10-172 )

Hbor_05470 Sequence-based Similarity Data Conserved Domain Database Search (CDD) Top hit: COG number: cd00130 COG name: PAS E-value: 2.33e-12 (2.33 x 10-12 ) Function: Ligand-binding sensors for light and oxygen in signal transduction (signal sensor) Second hit: COG number: cd00130 COG name: PAS E-value: 3.20e-10 (3.20 x 10-10 )

Hbor_05470 Sequence-based Similarity Data Conserved Domain Database Search (CDD)

Hbor_05470 Sequence-based Similarity Data Conserved Domain Database Search (CDD) Other mentionable specific hits: HTH_10 (Helix-turn-Helix DNA-binding protein) E-value: 2.99e-16 (2.99 x 10-16 ) Function: Metal-regulated repressor GAF (GAF Domain) - pfam01590 E-value: 1.11e-10 (1.11 x 10-10 ) Function: Light-binding receptor, protein-protein (binding) interactions, autoinhibition/enzyme activation FhlA (GAF Domain) E-value: 2.62e-08 (2.62 x 10-8 ) Function: Signal transduction mechanism

Hbor_05470 Sequence-based Similarity Data Sequence Logo (T-Coffee & WebLogo)

Hbor_05470 Sequence-based Similarity Data Sequence Logo (T-Coffee & WebLogo)

Hbor_05470 Sequence-based Similarity Data Sequence Logo (T-Coffee & WebLogo)

Hbor_05470 Cellular Localization Data Gram Stain Gram-negative (as explained earlier)

Hbor_05470 Cellular Localization Data Transmembrane Helices Hidden Markov Model There were NO (0) transmembrane helices predicted by TMHMM. Transmembrane topology graph:

Hbor_05470 Cellular Localization Data SignalP Predicted that since the discrimination score was not high enough, it is NOT a signal peptide. However, there were some notable score spikes in the signal peptide graph:

Hbor_05470 Cellular Localization Data LipoP Best prediction: No reliable prediction (rejected negative score of -0.200913). No plots made.

Hbor_05470 Cellular Localization Data PSORTb Made an ambiguous unknown prediction, with the following scores: Cytoplasmic Score: 2.50 Cytoplasmic Membrane Score: 2.50 (not helix) Cell Wall Score: 2.50 Extracellular Score: 2.50

Hbor_05470 Cellular Localization Data Phobius Phobius probability graph (with an almost flatline signal peptide probability):

Hbor_05470 Cellular Localization Data Hypothesis While it is challenging to come up with a concise and specific prediction due to the giant size of this gene and the equivocal results of cellular localization data, I speculate that this protein domain is located in cytoplasmic (DNAbinding), membranous (signal transduction), and extracellular (sensory) regions of the cell. However, it does NOT have transmembrane helices.

Hbor_05470 START Codon / Alternative ORF No Shine-Dalgarno regions present within 8-13 base pairs upstream of the current START codon.

Hbor_05470 Annotation (Conclusion) This gene most probably codes for a relatively enormous PAS protein domain with an S-box (sensory box). Further research showed that PAS S-box protein domains are part of many signaling proteins, which act as their signal-sensing regions. The mechanism of the S-box domain is linked to its widely-distributed prosthetic groups, which can detect associated cofactors such as: Heme (in oxygen sensors) FAD (in redox potential sensors) Chromophores (in photoactive sensors)

Hbor_05470 Annotation (Conclusion) Furthermore, proteins which contain the PAS S-box domain often contain other regulatory domains such as: Response regulator or sensor histidine kinase domains (signal transduction) Phytochromes (photoreception) Also, since a closely-related (but lower) BLAST-P hit predicted the gene product to be a bacterio-opsin activator-like protein, and because we have evidence of a HTH (helix-turn-helix) motif in our sequence, it is possible that the signal transduction pathway of this gene is linked to a DNA-binding region which plays a major role in regulating transcription.

Hbor_05470 Annotation (Conclusion) Overall, this relatively large PAS S-box protein domain is possibly involved in a signal transduction pathway that begins with its S-box sensing certain stimuli such as light and oxygen, transducing the signal all the way to a region where specific DNA-binding transcription regulators cause changes that ultimately allow expression of some required genes, thus resulting in translation of the necessary proteins.

Hbor_05470 Annotation (Conclusion) Source: Wikipedia

References http://archives.microbeworld.org/microbes/archae a/eat.aspx http://www.uprm.edu/biology/profs/rios/halo.pdf http://www.jbc.org/content/284/21/13969 http://encyclopedia.thefreedictionary.com/metallo protease http://molpharm.aspetjournals.org/content/65/2/26 7.full http://en.wikipedia.org/wiki/pas_domain http://europepmc.org/abstract/med/2459111