Halogeometricum borinquense Genome Annotation Project Presentation Loci Hbor_05620 & Hbor_05470 Presented by: Mohammad Reza Najaf Tomaraei
Hbor_05620 Basic Information DNA Coordinates: 527,512 528,261 Reverse ORF
Hbor_05620 Basic Information DNA Nucleotide Sequence: ATGGCTTCTGACGTGTCCTCCCAAACCAGCGACCTTCCCGCACCCGTTCGGG CGTTCGGGAGCGTTGCTCTCGTCGTCGTCCTCGTCCTCCTCTGTGCCAGCATC TTCGTCTCCTTCGGGACAGCCATCTTCCGAACGGTCGGTATCGACCGCGGGA GCGCGCTTTACATCGCCCTGCGGAGTGGGTCACAGTTCGTTGGCTTCGGCGT CGCCGCCGTCGGCTACCTCACTGTGACTGACCAGTGGGAGTTAGTGTACCGC CGCGTGCCGTCGCTGCGCGATCTGAAGTGGATCACCGCCGGGTTCGGCGTCC TCCTCGTCCTCTATCTGGCCATCAACGTCGGCCTCACCGCGTTGGGAATCGAC AGCGGTGACAGCGCAGTTGCCGCGACAGCGGAGGGCCAGCCTGTGCTGTTG CTGTACTACATCCCGGTGACGCTGCTCCTCGTCGCACCGACGGAGGAACTGG TGTTCCGTGGGGTCGTGCAGGGACTGTTCCGGCGTGAGTACGGCGTCCCGTT CGCTATCGCGGGGTCGAGTCTGACGTTCGCATCGATCCACGCCACCTCCTTTA CCGGTGAGGGGGCGGTCGTCTCGCTCATCGTCGTCCTGATTCTCGGCGGCGT TCTCGGCATCGTCTACGAGAAAAGCGAGAGCCTAGTCGTCCCCGTCGTCGCG CACGGGCTCTACAACACCGTCCAGTTCGCCGCCACCTACGCGATGGCGGTTG GACTTGTAGGGTCAGTGTGA Sequence Length: 750 bases
Hbor_05620 Basic Information Amino acid / Protein Sequence: MASDVSSQTSDLPAPVRAFGSVALVVVLVLLCASIFVSFGTAIFRT VGIDRGSALYIALRSGSQFVGFGVAAVGYLTVTDQWELVYRRVPSL RDLKWITAGFGVLLVLYLAINVGLTALGIDSGDSAVAATAEGQPVLL LYYIPVTLLLVAPTEELVFRGVVQGLFRREYGVPFAIAGSSLTFASIH ATSFTGEGAVVSLIVVLILGGVLGIVYEKSESLVVPVVAHGLYNTVQ FAATYAMAVGLVGSV Sequence Length: 249 amino acids
Hbor_05620 Sequence-based Similarity Data Standard Protein BLAST Top hit: Gene product name: metal-dependent membrane protease Organism: Halosarcina pallida (low-salt archaeon) Alignment length: 249 Score: 276 bits E-value: 2e-89 (2 x 10-89 ) Second hit: Gene product name: CAAX amino terminal protease family protein Organism: Haloferax denitrificans (high-salt archaeon) Alignment length: 245 Score: 200 bits E-value: 1e-59 (1 x 10-59 )
Hbor_05620 Sequence-based Similarity Data Conserved Domain Database Search (CDD) Top hit: COG number: pfam02517 COG name: Abi (CAAX protease self-immunity) E-value: 2.33e-15 (2.33 x 10-15 ) Second hit: COG number: COG1266 COG name: COG1266 (Predicted metal-dependent membrane protease) E-value: 2.73e-11 (2.73 x 10-11 )
Hbor_05620 Sequence-based Similarity Data Sequence Logo (T-Coffee & WebLogo)
Hbor_05620 Sequence-based Similarity Data Sequence Logo (T-Coffee & WebLogo)
Hbor_05620 Cellular Localization Data Gram Stain According to Montalvo-Rodriguez et al. (1998), who first isolated H. borinquense and carried out gram-staining, this archaeon has a negative gram stain (pink), which means that its structure is similar to the following diagram (of a gramnegative bacteria):
Hbor_05620 Cellular Localization Data Transmembrane Helices Hidden Markov Model There were 7 (magic number) transmembrane helices predicted by TMHMM. POSSIBLE N-terminus signal sequence. Transmembrane topology graph:
Hbor_05620 Cellular Localization Data SignalP Predicted that since the discrimination score was not high enough, it is NOT a signal peptide. However, there were some notable score spikes in the signal peptide graph:
Hbor_05620 Cellular Localization Data LipoP Best prediction: Transmembrane helix. No plots made.
Hbor_05620 Cellular Localization Data PSORTb Strongly predicted the subcellular localization of this protein to be in the Cytoplasmic Membrane, with a score of 9.99 (all other scores were zero).
Hbor_05620 Cellular Localization Data Phobius Phobius probability graph (with a flat-line signal peptide probability of 0):
Hbor_05620 Cellular Localization Data Hypothesis This integral protein is located in (through) the cytoplasmic membrane of the cell, with faces at both the inner and outer sides of the membrane.
Hbor_05620 START Codon / Alternative ORF No Shine-Dalgarno regions present within 8-13 base pairs upstream of the current START codon.
Hbor_05620 Annotation (Conclusion) This gene most probably codes for a metaldependent membrane-bound peptidase (intermembrane metalloprotease), which is involved with the release of certain transcription factors. According to Wolfe (2009), similar intermembrane proteases are also found in bacteria and archaea, which play an essential role in the proteolysis (hydrolysis of proteins into smaller polypeptides) of membrane-bound transcription factors needed for controlling expression of certain genes.
Hbor_05620 Annotation (Conclusion) The spike observed in Signal-P could be the region where the protein has a binding site for the activating metal ion ligand (highly likely to be zinc). Once activated, this enzyme probably performs proteolysis (making cleavages) on the protein in its active site, and eventually releases the necessary transcription factors.
Hbor_05620 Annotation (Conclusion) Source: LookForDiagnosis.com
Hbor_05470 Basic Information DNA Coordinates: 510,367 513,576 Reverse ORF
Hbor_05470 Basic Information DNA Nucleotide Sequence: GTGACCGACTCGTACACCGTCTTGGTCGTCGGCACGTTGCCATCCCGGTTCCATACCGAG CGATTCGAGGCGGCGTTCGACGACGCAACGCTCAGGTGGGTCGAACAACCCGAGGGAAATT CGACCGTCTTCGAGGCCACAGACTGTATCCTCGCCACCATGGAAGTCGTCTCGTCGGGGGA TTTCGATCCTGAGGCCGCCGCTGTCCCCGTTTTGCTGATCGGTGACAGAGAGGACAGTATC GCAGAAATCGCACTCTCGACGGATGTCGTAGACTATCTCGCCGTCCGGGGAGTGGATGCGG AGGCGACGTGGTTAGCCAACCGGATGGAGGCGGCCGCTGACTCCTATCGGACGGACAAAA AGCGGGCGCAACTCGACAGACAACAGCGAGCGCTTTCAGATCTCGGCGCGTTCGCGCTCTC CGGGCCGGCGCGAGAGGAAGTGTTCGCCGAAACCGTCGAAATCGTTACCGAAACGCTCGAT GCCGGGCGGGCCGCTCTCCTCCAGTCGCGCCCTGAACACGGTGACCTGTCGATTGTCGCC GCCAAAGGCTGGCCGCAAGTCTACGTCGGCGGCGTCGCCGTCGGACTCGACTCCGGGCCG GGACGGGCACTCACGAACCGAGAGCCAGTCGTCGAAAACGACCTGT [... too long...] GGACGACGACGAACACCTCTATGAGGTCCAGCCGGCGGACACGACGCCGTTCGAGACAGT GTACGCTGGACGTGGCAGACTCCGCGAGATGGTGGCCGAAAACGGCGTCTGTACAGTGTC GCTGACGATTCCTTCGGATGTAAGCGTGCGGTCGGTCGTGGACGCATTCGCCGCAACGTAC TCCCGGACGACGCTCGCTGCTCGTCGAACGCTCACGGAACCGACCGACTCGGTCGGGAGTT TCCGAGCGCGCCTCGACGAAGTGTGGACCGAACGACAGCGAGAAGCAATCTCGGCCGCAC TCCACGGAGGACTGTACGACTGGCCGCGCAAGACATCCGTCTCGACGCTCTCGGAAGCGTT CGATGTCTCCTCGCCGACCTTTCAGTACCACCTCCGAGCCGCAGAGCGCAAACTGATCGAA CTCATTTTAGACTGA Sequence Length: 3210 bases
Hbor_05470 Basic Information Amino acid / Protein Sequence: MTDSYTVLVVGTLPSRFHTERFEAAFDDATLRWVEQPEGNSTVFEATDCILATMEVVSS GDFDPEAAAVPVLLIGDREDSIAEIALSTDVVDYLAVRGVDAEATWLANRMEAAADSYRT DKKRAQLDRQQRALSDLGAFALSGPAREEVFAETVEIVTETLDAGRAALLQSRPEHGDLS IVAAKGWPQVYVGGVAVGLDSGPGRALTNREPVVENDLSSETTELTAHLDAGSELSVVV GGGTEPWGVLTVHSSESGAFDETDARFTENVAALIAAVIERETLRTTLEEMFSRMDQGLI GLDNDWRVTYMNPEAERLLDTAASEVVGTNYWDLFDSDAVKPFRERYEKAVKTGEKVS FESYFPPHDRWYEVEAYPSQAGLSVYFADITDRTEREMELLRYERMVEAADDGVYALDS DQHIVQVNQAFAEMFSREQESLIGMHTTELIDEDTAAESALIQAEAARTGEPKRMEFKAEL PDGTEVWIETHFSAIVDEETDQFVGTVGVARDVTERRHRERSLTMLHERTREMAQADNA DAVVTRTIEGCHSLFDPCRAAFFDYDATTRTLERHPQSDEVDRGRYQSDGVNRDAPVES ENDPCWVAFTEERMVRVEEGTTVQFVPVGQYGVLAVERLSGATIRETDAEMLGLLAATM GELLGSVETKQALRSRDQQLEQQNERLTQLNRINRTVREVTRSVVHATTTEEATARACE RLVEAEPYQFAWLCEAPEEANSDDRVVPMTTTGVEDSYAARLTEAAQTSPFPELLSRVA STGRRAVVNDVLDDPAWEPHRRDALAQGFRSIAVVPAGNDRLLVVHGTRPDTFAGEDG DVLVELGETLGAVIDRLGRTQPILDERQTEVELEIRDDQHFLVRLSTATGETATVTGVVPT AEGDYRTFVRTAAPKNAVRDALPPGTLARELTDEDDDEHLYEVQPADTTPFETVYAGRG RLREMVAENGVCTVSLTIPSDVSVRSVVDAFAATYSRTTLAARRTLTEPTDSVGSFRARL DEVWTERQREAISAALHGGLYDWPRKTSVSTLSEAFDVSSPTFQYHLRAAERKLIELILD Sequence Length: 1069 amino acids
Hbor_05470 Sequence-based Similarity Data Standard Protein BLAST Top hit: Gene product name: pas domain s-box (sensor) Organism: Halosarcina pallida (low-salt archaeon) Alignment length: 1067 Score: 1453 bits E-value: 0 (high quality match) Second hit: Gene product name: light and oxygen sensing histidine kinase (typically transmembrane, signal transduction) Organism: Haloferax prahovense (high-salt archaeon) Alignment length: 1073 Score: 546 bits E-value: 3e-172 (3 x 10-172 )
Hbor_05470 Sequence-based Similarity Data Conserved Domain Database Search (CDD) Top hit: COG number: cd00130 COG name: PAS E-value: 2.33e-12 (2.33 x 10-12 ) Function: Ligand-binding sensors for light and oxygen in signal transduction (signal sensor) Second hit: COG number: cd00130 COG name: PAS E-value: 3.20e-10 (3.20 x 10-10 )
Hbor_05470 Sequence-based Similarity Data Conserved Domain Database Search (CDD)
Hbor_05470 Sequence-based Similarity Data Conserved Domain Database Search (CDD) Other mentionable specific hits: HTH_10 (Helix-turn-Helix DNA-binding protein) E-value: 2.99e-16 (2.99 x 10-16 ) Function: Metal-regulated repressor GAF (GAF Domain) - pfam01590 E-value: 1.11e-10 (1.11 x 10-10 ) Function: Light-binding receptor, protein-protein (binding) interactions, autoinhibition/enzyme activation FhlA (GAF Domain) E-value: 2.62e-08 (2.62 x 10-8 ) Function: Signal transduction mechanism
Hbor_05470 Sequence-based Similarity Data Sequence Logo (T-Coffee & WebLogo)
Hbor_05470 Sequence-based Similarity Data Sequence Logo (T-Coffee & WebLogo)
Hbor_05470 Sequence-based Similarity Data Sequence Logo (T-Coffee & WebLogo)
Hbor_05470 Cellular Localization Data Gram Stain Gram-negative (as explained earlier)
Hbor_05470 Cellular Localization Data Transmembrane Helices Hidden Markov Model There were NO (0) transmembrane helices predicted by TMHMM. Transmembrane topology graph:
Hbor_05470 Cellular Localization Data SignalP Predicted that since the discrimination score was not high enough, it is NOT a signal peptide. However, there were some notable score spikes in the signal peptide graph:
Hbor_05470 Cellular Localization Data LipoP Best prediction: No reliable prediction (rejected negative score of -0.200913). No plots made.
Hbor_05470 Cellular Localization Data PSORTb Made an ambiguous unknown prediction, with the following scores: Cytoplasmic Score: 2.50 Cytoplasmic Membrane Score: 2.50 (not helix) Cell Wall Score: 2.50 Extracellular Score: 2.50
Hbor_05470 Cellular Localization Data Phobius Phobius probability graph (with an almost flatline signal peptide probability):
Hbor_05470 Cellular Localization Data Hypothesis While it is challenging to come up with a concise and specific prediction due to the giant size of this gene and the equivocal results of cellular localization data, I speculate that this protein domain is located in cytoplasmic (DNAbinding), membranous (signal transduction), and extracellular (sensory) regions of the cell. However, it does NOT have transmembrane helices.
Hbor_05470 START Codon / Alternative ORF No Shine-Dalgarno regions present within 8-13 base pairs upstream of the current START codon.
Hbor_05470 Annotation (Conclusion) This gene most probably codes for a relatively enormous PAS protein domain with an S-box (sensory box). Further research showed that PAS S-box protein domains are part of many signaling proteins, which act as their signal-sensing regions. The mechanism of the S-box domain is linked to its widely-distributed prosthetic groups, which can detect associated cofactors such as: Heme (in oxygen sensors) FAD (in redox potential sensors) Chromophores (in photoactive sensors)
Hbor_05470 Annotation (Conclusion) Furthermore, proteins which contain the PAS S-box domain often contain other regulatory domains such as: Response regulator or sensor histidine kinase domains (signal transduction) Phytochromes (photoreception) Also, since a closely-related (but lower) BLAST-P hit predicted the gene product to be a bacterio-opsin activator-like protein, and because we have evidence of a HTH (helix-turn-helix) motif in our sequence, it is possible that the signal transduction pathway of this gene is linked to a DNA-binding region which plays a major role in regulating transcription.
Hbor_05470 Annotation (Conclusion) Overall, this relatively large PAS S-box protein domain is possibly involved in a signal transduction pathway that begins with its S-box sensing certain stimuli such as light and oxygen, transducing the signal all the way to a region where specific DNA-binding transcription regulators cause changes that ultimately allow expression of some required genes, thus resulting in translation of the necessary proteins.
Hbor_05470 Annotation (Conclusion) Source: Wikipedia
References http://archives.microbeworld.org/microbes/archae a/eat.aspx http://www.uprm.edu/biology/profs/rios/halo.pdf http://www.jbc.org/content/284/21/13969 http://encyclopedia.thefreedictionary.com/metallo protease http://molpharm.aspetjournals.org/content/65/2/26 7.full http://en.wikipedia.org/wiki/pas_domain http://europepmc.org/abstract/med/2459111