DNA sequence analysis of the imp UV protection and mutation operon of the plasmid TP110: identification of a third gene

Similar documents
Practical Bioinformatics

SUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA

SUPPLEMENTARY DATA - 1 -

High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

Crick s early Hypothesis Revisited

Number-controlled spatial arrangement of gold nanoparticles with

Supporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)-

Advanced topics in bioinformatics

Characterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin

Clay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture.

Supplementary Information for

Supplemental data. Pommerrenig et al. (2011). Plant Cell /tpc

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

NSCI Basic Properties of Life and The Biochemistry of Life on Earth

SSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr),

Supporting Information

Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy

Electronic supplementary material

Supplementary Information

Supplemental Figure 1.

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Regulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)

Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R

Supplemental Table 1. Primers used for cloning and PCR amplification in this study

3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies

evoglow - express N kit Cat. No.: product information broad host range vectors - gram negative bacteria

TM1 TM2 TM3 TM4 TM5 TM6 TM bp

evoglow - express N kit distributed by Cat.#: FP product information broad host range vectors - gram negative bacteria

Evolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval

SUPPLEMENTARY INFORMATION

Modelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics

SUPPLEMENTARY INFORMATION

The role of the FliD C-terminal domain in pentamer formation and

160, and 220 bases, respectively, shorter than pbr322/hag93. (data not shown). The DNA sequence of approximately 100 bases of each

Protein Threading. Combinatorial optimization approach. Stefan Balev.

The Trigram and other Fundamental Philosophies

Biosynthesis of Bacterial Glycogen: Primary Structure of Salmonella typhimurium ADPglucose Synthetase as Deduced from the

Codon Distribution in Error-Detecting Circular Codes

ydci GTC TGT TTG AAC GCG GGC GAC TGG GCG CGC AAT TAA CGG TGT GTA GGC TGG AGC TGC TTC

Why do more divergent sequences produce smaller nonsynonymous/synonymous

Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective

Re- engineering cellular physiology by rewiring high- level global regulatory genes

ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line

Identification of a Locus Involved in the Utilization of Iron by Haemophilus influenzae

part 3: analysis of natural selection pressure

Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi

Pathways and Controls of N 2 O Production in Nitritation Anammox Biomass

Supplementary Information

The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded DNA Sequence Is Self-Designed as a Numerical Whole

Introduction to Molecular Phylogeny

FliZ Is a Posttranslational Activator of FlhD 4 C 2 -Dependent Flagellar Gene Expression

Evolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila

Chain-like assembly of gold nanoparticles on artificial DNA templates via Click Chemistry

Near-instant surface-selective fluorogenic protein quantification using sulfonated

DNA sequence analysis of the imp UV protection and mutation operon of the plasmid TP110: identification of a third gene

Supporting Information. An Electric Single-Molecule Hybridisation Detector for short DNA Fragments

Timing molecular motion and production with a synthetic transcriptional clock

Objective: You will be able to justify the claim that organisms share many conserved core processes and features.

AtTIL-P91V. AtTIL-P92V. AtTIL-P95V. AtTIL-P98V YFP-HPR

Evolutionary Analysis of Viral Genomes

Supplemental Figure 1. Phenotype of ProRGA:RGAd17 plants under long day

From DNA to protein, i.e. the central dogma

Diversity of Chlamydia trachomatis Major Outer Membrane

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.

THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE

Chemical Biology on Genomic DNA: minimizing PCR bias. Electronic Supplementary Information (ESI) for Chemical Communications

Supplementary Figure 1. Schematic of split-merger microfluidic device used to add transposase to template drops for fragmentation.

Aoife McLysaght Dept. of Genetics Trinity College Dublin

evoglow yeast kit distributed by product information Cat.#: FP-21040

dead, a New Escherichia coli Gene Encoding a Presumed

ANALYZING THE DIVERSITY OF A SMALL ANTIBODY MIMIC LIBRARY. Nick Empey. Chapel Hill 2010

evoglow basic kit product information

Insects act as vectors for a number of important diseases of

Nature Genetics: doi:0.1038/ng.2768

It is the author's version of the article accepted for publication in the journal "Biosystems" on 03/10/2015.

PROTEIN SYNTHESIS INTRO

Cloning, Nucleotide Sequencing, and Expression of the Clostridium perfringens Enterotoxin Gene in Eschenichia coli

Supplementary information. Porphyrin-Assisted Docking of a Thermophage Portal Protein into Lipid Bilayers: Nanopore Engineering and Characterization.

Using algebraic geometry for phylogenetic reconstruction

Supporting Information

Symmetry Studies. Marlos A. G. Viana

Biology 112 Practice Midterm Questions

Slide 1 / 54. Gene Expression in Eukaryotic cells

Evidence for RNA editing in mitochondria of all major groups of

arab Gene and Nucleotide Sequence of the arac Gene of Erwinia carotovora

The Physical Language of Molecules

Isolation of the hemf Operon Containing the Gene for the Escherichia coli Aerobic Coproporphyrinogen III Oxidase by

The effects of leader peptide sequence and length on attenuation control of the trp operon of E.coli

Regulation of the Bacillus subtilis Heat Shock Gene htpg Is under Positive Control

Appendix B Protein-Signaling Networks from Single-cell Fluctuations and Information Theory Profiling B.1. Introduction

Sequence Divergence & The Molecular Clock. Sequence Divergence

Electronic Supporting Information for

Characterization of Multiple-Antimicrobial-Resistant Salmonella Serovars Isolated from Retail Meats

Motif Finding Algorithms. Sudarsan Padhy IIIT Bhubaneswar

Supplemental data. Vos et al. (2008). The plant TPX2 protein regulates pro-spindle assembly before nuclear envelope breakdown.

Evidence for Evolution: Change Over Time (Make Up Assignment)

codon substitution models and the analysis of natural selection pressure

Sequence analysis and comparison

Supplemental Figure 1. Differences in amino acid composition between the paralogous copies Os MADS17 and Os MADS6.

Transcription:

QD) 1990 Oxford University Press Nucleic Acids Research, Vol. 18, No. 17 5045 DNA sequence analysis of the imp UV protection and mutation operon of the plasmid TP110: identification of a third gene David Lodwick, Darerca Owen and Peter Strike* Department of Genetics and Microbiology, University of Liverpool, PO Box 147, Liverpool L69 3BX, UK Received June 19, 1990; Revised and Accepted July 31, 1990 EMBL accession no. X53528 ABSTRACT The sequence of the imp operon of the plasmid TP1 10 (which belongs to the Incl1 incompatibility group) has been determined, and is shown to contain three open reading frames. This operon, involved in UV protection and mutation, is functionally analogous to the umudc operon of E.coli and the mucab operon of the plasmid pkm101, which belongs to the quite unrelated lncn incompatibility group. The umu and muc operons however contain only two open reading frames, coding for proteins of approximately 16kD and 46kD. The high degree of homology between the two 16kD proteins (UmuD and MucA) and between the two 46kD proteins (UmuC and MucB) clearly shows their relatedness. This is shown also to extend to the imp gene products, with ImpA sharing homology with UmuD and MucA, and lmpb sharing homology with UmuC and MucB. However, the two imp genes are preceded in the operon by a third gene, impc, which encodes a small protein of 9.5kD and which has no equivalent in the umu and muc operons. INTRODUCTION In Escherichia coli, the mutagenic responses to UV light and to a large number of chemical mutagens are largely dependent upon the inducible pathway commonly referred to as 'errorprone repair' (see (1) for a review). The essential components of this pathway include both the RecA protein and the products of the umu operon, UmuC and UmuD. These proteins, in some as yet uncharacterised way, may modify the DNA replication machinery to allow bypass synthesis across a damaged template. Several lines of evidence (2,3,4) suggest that the umu gene products may facilitate the prompt resumption of chain elongation at sites where the replication forks have been stalled by DNA damage. Phenotypically, umucd mutants show a greatly reduced mutagenic response to UV and many chemical mutagens, and they are also slightly UV sensitive. Although their gene products play an important role in mutation in E. coli, it has become apparent that many bacterial species do not carry equivalent genes (5). Even in the closely related species Salmonella typhimurium, the umu gene equivalents appear to be ineffective, and mutation in this species depends to a considerable extent on the presence of plasmids from a variety of incompatibility (Inc) groups, carrying functional analogues of the umu genes. Thus the sensitivity of the strains employed in the Ames mutagenicity test is greatly increased by the presence of the IncN plasmid pkmioi, which carries the muc analogue of the umu operon (6). Sequence analysis of both the umu and muc operons has revealed that although there is a wide divergence between these systems at the level of DNA sequence, they are closely related in terms of their gene products. Both operons contain two genes encoding proteins of MW15,064 and 47,681 in the case of umu (umud and umuc respectively), and MW16,371 and 46,362 in the case of muc (muca and mucb respectively) (7,8). In addition to being very similar in size, the proteins of these operons show considerable amino acid homology. UmuD and MucA proteins are 41% homologous, UmuC and MucB proteins are 55 % homologous, these areas of homology being largely confined to blocks of amino acids which are almost completely conserved. The implication of these observations is clearly that the IncN plasmid muc genes and the chromosomal umu genes are evolutionarily related, and that they have probably evolved from a common ancestor. The similarities in the gene products leave little doubt that these two operons perform similar if not identical functions. In this paper, we describe the sequence analysis of a second plasmidencoded umuanalogue, encoded in this case by the imp genes of the Incld plasmid TPl10 (9). Although both this plasmid and pkm101 were originally derived from clinical isolates of Salmonella typhimurium, they can both be transferred very efficiently by conjugation to E. coli, where their UV protection and mutation properties are again manifested. We wished to determine whether the UV protection and mutation functions of this plasmid, which is completely unrelated to the IncN plasmids such as pkm 101, were also evolutionarily related to the umu system, or whether this was a functionally analogous system with no structural similiarities. The results clearly indicate that all three systems, umu, muc, and imp, share a common * To whom correspondence should be addressed

5046 Nucleic Acids Research, Vol. 18, No. 17 ancestry, but that there are some major differences between imp and the other two operons. The most obvious of these is that the imp operon contains a third gene, encoding a small protein of MW9,800. MATERIALS AND METHODS Bacterial Strains used included TG16(lacpro), supe, thi, hsdd5 [F' trad36 proa+b+ laciq lacz6m15] (10); and JL652as AB1 157, F thi], thrj, leu6, proa2, arge3, his4, lacyl, galk2, xyls, ara14, rpsl, sup37, but containing also F' laciq and pjl59 (J.Little, pers.comm) Plasmids: pkg10 (5.9kb (11)) contains the 3.2kb EcoRl Bglfl fragment of TP1 10 carrying the imp genes, cloned into the vector pt71. pdl2 (4.8kb; this work) contains the 2.4kb ClaIXbaI fragment of pkg1o carrying the impb gene, cloned into AccIXbaI digested pt72. pgpl2, pt71, and pt72 make up the T7 RNA polymerase / promoter controlled expression system of Tabor and Richardson (12). pjl59 (Ampr lexa+) is a LexA overproducing plasmid kindly provided by Dr J. Little. DNA Sequencing Sequencing of the imp genes was achieved by directed subcloning of fragments from pkg10 into M13 mpl8 and mpl9 as appropriate (13), the clones used being shown in Figure 1. The subclones were sequenced by the dideoxy chain termination method of Sanger et al (14), initially using the standard Klenow (E. coli DNA polymerase 1 large fragment) method and latterly the modified T7 DNA polymerase (Sequenase) method (United States Biochemical Corporation), (15). C band compressions were resolved by the substitution of ditp for dgtp as necessary. The products of the reactions were labelled with [a35s]datp (Amersham) and were subsequently analysed on buffer gradient gels (16). Single stranded DNA templates were prepared from PEG precipitated phage particles from the supernatent of an infected culture of strain TG1 (10). LexABinding LexA protein was purified from the overproducing strain JL652 (the generous gift of Dr John Little) using the method of Schnarr et al (17). Cells were lysed by sonication, cleared by centrifugation, and nucleic acid in the supernatent precipitated with Polymin P. Total protein was then precipitated with ammonium sulphate, dissolved, and subjected to chromatography on phosphocellulose. The peak fractions contained > 95 % pure LexA protein, as judged by stained SDSPAGE. These fractions were pooled and used for subsequent analysis. LexA DNA binding reactions contained the endlabelled imp DNA fragment, prepared as described below (2050,000 cpm per reaction), and purified LexA protein to give the required final concentration of between i05 and 101'M. Final reaction volumes of 120itd contained DNA, protein, and reaction buffer (100mM TrisHCl, ph7.4; 1.6mM NaEDTA; 0.04mg/mi bovine serum albumen; 20% glycerol; 25mM NaCl). The reaction was incubated at 370C for Smins, and then on ice for 15mins. 100,ul samples were loaded onto nondenaturing 5% polyacrylamide gels, made up in TBE buffer (0.09M Trisborate, 0.002M EDTA, ph 8.0). The gel was run in 1 x TBE at a constant current of 5OmA for 23 hours, dried and then exposed to Xray film at 70 C overnight or longer if required. Labelling of DNA fragments DNA fragments for use in gel retardation experiments were endlabelled using [ae35s]datp and T4 DNA polymerase in a replacement synthesis reaction, as described by Maniatis et al (18). Unincorporated material was removed by passage through Sephadex G50 (nucleic acid grade, PharmaciaLKB). Identification of PlasmidCoded Gene Products Proteins encoded by the imp genes were identified using the T7 RNA polymerase / promoter system as devised by Tabor and Richardson (12, and personal communication). Cells containing both the bacteriophage T7 RNA polymerase plasmid pgpi2, and a pt7 recombinant plasmid, were grown in Luria broth to midlog phase. A 0.2ml sample was taken, washed twice in M9 buffer, and resuspended in M9 buffer supplemented with 20tg/ml thiamine and 0.01 % (w/v) 18 amino acids (excluding cysteine and methionine). The cells were grown with shaking at 30 C for 60 mins, then the temperature shifted to 43 C for 15 mins. Rifampicin was then added to a final concentration of 200%g/ml, to inhibit host directed transcription, and the cells were incubated at 42 C for a further 10 mins. The temperature was then shifted down to 30 C for 20 mins, when 10,uCi 35Smethionine (Amersham) was added. Incubation was continued for 5 mins, the cells were pelleted, and labelled proteins analysed by SDS PAGE as described previously (9). RESULTS Sequencing the imp Operon We have previously described the location of the imp genes of TP1 10 within a 3.2kb EcoRl BglII fragment which is capable of fully expressing UV protection and mutation functions when cloned in high copy number vectors (9). Further subcloning defined the imp coding region as being completely contained on a 2.6kb EcoRl EcoRV fragment (11), and this fragment was chosen as the starting material for the determination of the DNA sequence. The strategy adopted for the sequence determination is shown in Figure 1. The complete sequence was determined on both strands using the dideoxy chain termination method of Sanger et al (14), as described above. The resulting sequence EcoRI NaeI PvuII ClaI _ J C _ PstI IMMMMAN. I4 44' ( EcoRV P PvuII ti NaeI HindIII SphI FL1 ~~~BglI KpnI A I B, 4 * 4 4 3 4 4 4 4 4 EcoRV.mj Figure 1. Gene organisation, restriction map and sequencing strategy of the imp operon of TP1 10. Subclones derived from the EcoRIEcoRV fragment (2.6kb) were used to determine the sequence on both strands for a distance of 2527 bp. from the EcoRl site. The extent of the sequenced fragments is shown as arrows. The positions of a number of six base pair restriction sites are shown, as are the positions of the three imp open reading frames, C, A, and B. Transcription across the operon proceeds in the left to right direction.

Nucleic Acids Research, Vol. 18, No. 17 5047 is shown in Figure 2. The restriction sites predicted from the product, we felt it important to ensure that these small fragments DNA sequence were in complete agreement with those found were true components of the imp operon, and were not artefacts earlier by restriction analysis (9), with the exception of an of the initial cloning strategy. EcoRIlEcoRV restriction digests additional PstI site at position 1685 (numbered from the EcoRI of TP110 DNA were therefore hybridised with labelled M13 site). This site, together with the site at 1607, generates a small clones containing these two small fragments, cloned individually. PstI fragment of 78 bp. which had not previously been detected: In each case, hybridisation was observed to the appropriate band, a further pair of PstI sites (at bases 629 and 790 in the sequence) confirming that the sequenced fragment represented an also generate a small PstI fragment. As the subclone carrying uninterrupted region of TP1 10. the imp operon was originally isolated as a PstI partial digestion Analysis of the sequence shown in Figure 2 reveals three major 1 TCCT C G GGCCGTGCCG CACCTGTTAC CGGGCGCCG 91 CAZOCGCGTO CATACGMTG CAGGSTCCCC MG~GGACmG AGAAACGTG CCCQAArGCCG ACCGGGTGAG GAGGCCGGCG ACAGCGACAC ACTTGCATCG GATGCAGCCC GGTTAAGTGC 181 aoocacc TOOOTMCCA GGTATTTTGT CCACAGAACC GTGCACAAAA T GCTOTOGAT AAGCTGGACA CAGCAGCCCA CAGCAGOCAT 271 ACAGCC&CAC ACTCAGAGM........ TAGCGTTT TTTAGTCACA ACAATGACAT TTGACIGTGT GCGGATTCAG TTATTAGAAT ACTSTATATA 361 CATACaCM MAMMOG AG&TGAGAAC 390 I. LxAa.. 391 ATG ATC CGC ATT GAA ATT CTT TTC GAC CGC CAG AGC ACA AAA AAC CTC AAA TCA GGC ACA CTT CAG GCG CTA CAG AAT N I R I K I L F D R Q S T K N L K S G T L Q A L Q N 469 GAA ATc GMA CAM CGT CTG AAA CCA CAC TAC CCG GMA ATC TGG CTG CGT ATA GAC CAG GGC AGC GCC CCG TCT GTA TCT K I W Q R L K P H Y P E I W L R I D Q G S A P S V S 547 GTC ACA GGA GCC CGC AAC GAC AAG GAT AAA GAA CGT ATC CTG TCT CTG CTG GAA GAA ATC TGG CAG GAC GAC AGC TGG V T G A R N D K D K E R I L S L L E E I W Q D D S W 625 CTG CCT GCA GCA TGA 639 L P A A Z 636 ATG AGT ACC GTT TAT CAC CGT CCG GCT GAC CCT TCA GGC GAT GAC AGT TAC GTG CGA CCG TTG TTT M S T V Y H R P A D P 8 G D D S Y V R P L F 702 GCC GAT CGT TGC CAG GCC GGT TTT CCT TCG CCT GCC ACT GAT TAT GCT GAG CAG GAA CTG GAT CTG AAC AGC TAT TGC A D R C Q A G F P s P A T D Y A E Q E L D L N S Y C 780 ATC AGC AGA CCT GCA GCC ACC TTC TTT CTG CGC GCC AGC GGT GAA TCG ATG AAC CAG GCT GGC GTG CAG AAT GGT GAT I S R P A A T F F L R A S G E S M N Q A G V Q N G D 858 CTG CTG GTA GTG GAC AGG GCC GAA AAA CCA CAA CAC GGG GAC ATC GTT ATC GCT GAG ATC GAC GGT GAG TTC ACC GTC L L V V D R A E K P Q H G D I V I A E I D G E F T V 936 AAA CGA CTG CTG TTG CGC CCA CGC CCG GCA CTG GAG CCG GTT TCA GAC AGC CCG GAG TTC CGC ACA CTG TAT CCG GAA K R L L L R P R P A L E P V S D S P E F R T L Y P E 1014 AAC ATC TGT ATT TTT GGT GTT GTC ACT CAC GTG ATA CAC AGG ACG CGG GAG TTA CGC TGA 1073 N I C I F G V V T H V I H AR...T, R E L R Z *D 1073 ATG TTT GCA CTG GCT GAT M F A L A D 1091 ATC AAC AGT TTC TAC GCC TCA TGT GAA AAA GTT TTC CGC CCG GAC CTT CGC AAC GAA CCG GTC ATC GTA CTC AGC AAT I N S F Y A S C E K V F R P D L R N E P V I V L S N 1169 AAC GAT GGC TGT GTG ATC GCG CGC AGC CCG GAG GCA AAA GCC CTT GGC ATC AGA ATG GGG CAG CCC TGG TTT CAG GTG N D G C V I A R S P E A K A L G I R M G Q P W F Q V 1247 AGA CAA ATG CGC CTG GAG AAG AAA ATA CAT GTA TTT TCC AGC AAT TAT GCG CTG TAC CAC AGC ATG AGC CAA CGG GTT R Q M R L E K K I H V F S S N Y A L Y H S M S Q R V 1325 ATG GCT GTT CTG GAG TCG CTT TCT CCC GCA GTT GAG CCC TAC TCA ATT GAT GAA ATG TTC ATT GAT TTG CGG GGG ATA M A V L E S L S P A V E P Y S I D E M F I D L R G I 1403 AAT CAT TGC ATC TCT CCG GAG TTT TTT GGT CAT CAG CTC AGG GAA CAG GTA AAG AGC TGG ACA GGA CTC ACC ATG GGG N H C I S P E F F G H Q L R E Q V K S W T G L T M G 1481 GTG GGC ATT GCG CCT ACA AAA ACG CTG GCT AAA AGT GCA CAG TGG GCA ACA AAG CM TGG CCA CAG TTT TCC GGA GTG V G I A P T K T L A K S A Q W A T K Q W P Q F S G V 1559 GTC GCG CTG ACG GCA GAA AAC CGT AAT CGG ATC TTG AAG CTA CTG GGG CTG CAG CCA GTT GGT GAG GTC TGG GGA GTA V A L T A E N R N R I L K L L G L Q P V G E V W G V 1637 GGA CAC AGA CTG ACG GAA AAG CTG AAT GCG CTG GGT ATT AAC ACA GCA CTG CAG CTG GCG CAG GCT AAC ACG GCA TTC G H R L T E K L N A L G I N T A L Q L A Q A N T A F 1715 ATC CGG MAMAAC TTC AGC GTC ATT CTT GAG CGT ACG GTA CGC GAA CTC AAC GGC GAG TCC TGC ATA TCC CTG GAA GAM I R K N F S V I L E R T V R E L N G E S C I S L E E 1793 GCA CCA CCG GCA AAA CAG CAG ATT GTC TGT AGT CGC AGT TTT GGT GAA CGA ATC ACA GAC AAA GAT GCC ATG CAC CAG A P P A K Q Q I V C S R S F G E R I T D K D A M H Q 1871 GCT GTT GTT CAG TAT GCA GAG CGG GCC GCA GAG AAA CTA CGT GGG GAG CGT CAG TAT TGC CGG CAG GTG ACG ACA TTT A V V Q Y A E R A A E K L R G E R Q Y C R Q V T T F 1949 GTA CGG ACA TCA CCC TTT GCA GTA AAA GAA CCC TGT TAC AGC AAT GCC GCT GTG GAA AAG CTT CCA TTG CCC ACA CAG V R T S P F A V K E P C Y S N A A V E K L P L P T Q 2027 GAC AGC CGG GAC ATT ATT GCC GCC GCA TGC AGA GCC TTA AAC CAT GTC TGG CGT GAA GGG TAC CGC TAT ATG AAG GCA D S R D I I A A A C R A L N H V W R E G Y R Y M K A 2105 GGT GTC ATG CTG GCT GAT TTC ACA CCA TCG GGT ATA GCG CAG CCG GGA TTA TTT GAT GAA ATC CAG CCC CGT AAA AAC G V M L A D F T P S G I A Q P G L F D E I Q P R K N 2183 AGT GAA AAG TTA ATG AAA ACA CTC GAT GAA CTG AAC CAG TCG GGA AAA GGG AAA GTG TGG TTT GCG GGG CGA GGA ACC S E K L M K T L D E L N Q S G K G K V W F A G R G T 2261 GCC CCT GAA TGG CAA ATG AAA CGG GAA ATG TTG AGT CAG TGT TAT ACA ACT AAA TGG CGA GAT ATT CCC CTG GCC AGG A P E W Q M K R E M L S Q C Y T T K W R D I P L A R 2339 CTG GGT TAG 2347 L G Z 2348 TTCAGTCATC TCCGAACATA TTTTCAGCGT TTCTTCTGGT CTGGTCTTCA CCGTCATTTC TCGACAGACT CTGCTCTGTA AGAGGAGTAC 2438 TGCAATCCAC AGACGTATGG AGACCAGATT GTTTTTCCAG TAATTCAAGC AATTTCTTTT CAACCGAAAA AGCACCTGGT ATTACGCTAA Figure 2. DNA sequence of the imp operon and predicted amino acid sequence of the gene products. The sequence starts at the EcoRI site shown in Figure 1. Structural features indicated in bold type include the likely promoter sequence (35 and 10), the ribosome binding site (SD), and the two LexA binding sites (LexAl and LexA2). The translation products are shown for each of the three imp open reading frames (impa, 391639; impb, 636 1073; impc, 1073 2347).

5048 Nucleic Acids Research, Vol. 18, No. 17 overlapping reading frames, downstream of a good consensus promoter sequence which is overlapped by two potential LexA binding sites. One of these overlaps the promoter 35 sequence, while the other overlaps the 10 region. Gel retardation studies using purified LexA protein and the 392 bp. EcoR1 Mbol imp fragment revealed only a singlestep binding reaction (Figure 3), and it appears likely that only one of these potential binding sites is strongly bound by LexA protein. This is most likely to be the site overlapping the promoter 10 sequence, since the central (NIO) of the CTG (NIo) CAG motif is in this instance very AT rich (90%), whilst the site which overlaps the 35 box contains only 50% AT base pairs. It is however possible that the 35 Lex box does play a role, particularly as a similar box with a 50% GC content also exists in front of the umudc operon (7). If LexA binding to such sites is weak, it is likely nqt to be detected under the conditions of the gel retardation assay. Open Reading Frames Within The Sequence The first open reading frame of the inp operon, designated impc, potentially encodes a protein of MW 9,491. The predicted amino acid sequence (Figure 2) shows no homology to the products of either the muc or umu operons. The second open reading frame, designated impa, potentially encodes a protein of MW16,201. This is similar in size to both UmuD (MW 15,064) and MucA (MW 16,371). Although there is little nucleotide sequence homology between these three genes, the degree of amino acid homology is striking (Figure 4). ImpA is 43 % homologous to MucA, and 49 % homologous to UmuD. The homology between MucA and UmuD is slightly lower at 40% Thirty one per cent of amino acids are common to all three proteins, this figure rising to fifty one per cent if conservative substitutions are allowed. As might be expected, this high degree of homology is achieved by the retention of blocks of amino acids which are essentially 100% conserved, these blocks being separated by more variable regions. It seems reasonable to assume that these blocks of conserved sequence represent sites essential for the mutagenic function of these proteins, while the variable sequences have other roles. Perry et al (7) demonstrated considerable amino acid homology between UmuD, MucA and the region of LexA protein surrounding its AlaGly (residues 84/85) cleavage site. This homology, as shown in Figure 4, extends to the amino acid sequence predicted for ImpA, and o 4F e A B C D E F G H 10>0 4 pq...j. 4 B lo?",, I'M 0 Figure 3. Electrophoretic mobility of the 413 bp EcoRI TaqI fragment of the imp operon in the presence of purified LexA protein. Lanes A and B contain the endlabelled DNA fragment, and no LexA protein. Lanes CI contain decreasing amounts of LexA, the concentration reducing tenfold in each track from 10 M in track C to 10 l M in track I. Arrows adjacent to track A indicate the positions of the molecular weight markers, and arrows labelled B and NB indicate the positions of the DNA band when the protein is either bound or not bound respectively. Electrophoresis conditions are described in the main text. includes the residues Serine and Lysine (residues 1 19 and 156 in LexA; 61 and 98 in ImpA), as well as the Ala and Gly (residues 27/28 in ImpA). In this latter respect, the ImpA protein resembles the MucA protein rather than UmuD in having an Ala Gly as opposed to a CysGly cleavage site. The serine and lysine residues in LexA have been shown to be important for the autolytic cleavage of the protein (19), and this conservation of sequence indicates that ImpA may undergo RecA mediated self cleavage in a manner similar to that demonstrated for LexA and UmuD (20,21). The third imp open reading frame, designated impb, potentially encodes a protein of MW 47,731, similar in size to UmuC (47,681) and MucB (46,362). Again, although there is little nucleotide homology between the coding regions, there is significant homology between the predicted amino acid sequences of the gene products (Figure 5). ImpB is 51 % homologous to MucB, and 54% homologous to UmuC. Homology between MucB and UmuC is slightly higher at 56%. In total, 43% of the amino acids are common to all three proteins, and if substitutions of similar amino acids is allowed, the figure rises to 57%. The impa and impb coding regions overlap by one base pair (Figure 2), in the same way as those of umud and umuc (7), which may have implications for translational coupling. There is an excellent match to the ShineDalgamo consensus sequence in the appropriate position (within impa) to promote efficient translation of impb. The impc and impa coding regions also overlap, but there is no obvious ShineDalgarno sequence upstream of the impa initiation codon. This may account for the apparent low level of translation of impa which has previously made it difficult to identify its product. U IuD X 1 f i: k: pad 1 r e i vt:: f P I f s ImpA s t v y h r p s g d dsyv r P = a MucA M k v di:: f: e s a svhs i P f y I d i v gq * * c G F P S P A a d Y v elqlr i dllln q 1 a r c q a G F P 8 P A t Y a eiq e 1 L 8 y q r i s a G F P S P A qggyy e k Q e 1 n L h e y * ~ * ~ * * * i q h P s A T y F v k a S G ds lti c; 8 r P a A T f F l r G ee n cvr h P s A T y F l r v S G s S e d g g i s q a g v q d g r i h d G D I L i V D s a i t a s H G d I V i A a v d n G DTL v V D r a e k p q HG a I V i A e a d GD vl Drs 1 t V ashgs IVvAc i h.q I F T V I k L q L R P t v q L i P m n s a y s g91 F T V K r L 1 L R P r p a L e P v s d s p e n K F T V K r L 1 L R P r p c L m P m n k d f p p it i 8s8 e d : t 1 d v f G V V i H v v k a m f r I y p e n ic : V V t H v i h r t v y y i d p d n e s v e i w G VV t H s 1 i e h r:. r : e r p v c r Figure 4. Sequence homology comparisons of the ImpA protein with UmuD and MucA. Amino acid residues which are conserved in all three equivalent gene products are shown in bold type and are flanked by bars. Sequences common to either UmuD and ImpA, or MucA and ImpA, are underlined. The AlaGly (Cys Gly in UmuD) Ser Lys residues involved in the autocleavage reaction are starred (*). Other residues common to ImpA and LexA are indicated by points (). Sequence data for UmuD and MucA are taken from Perry et al (7)

Nucleic Acids Research, Vol. 18, No. 17 5049 Identification of Gene Products In order to identify the products of the imp open reading frames, the coding region was inserted into suitable pt7 vectors and UmuC M F A L cd vn a f Y A S C E t v F R P D L w ImpBe F A L a D i N s t Y A S C E k V F R P D L r MUCB N F A L i D v N g m Y A S C E q a F R P D L a g k p V V V L S N N D G c v i A R n a e A K a 1 n e p V i V L S N N D G C V i A R s p e A R a n r a V a V L S N N D G n i v A R n y 1 A R k a G v k N G d P w F k q k d 1 G i r M G q P w;f q v r q m Gi1 k M G d P y F k v r p i f r r s g v v c F S r 1 e k k i h v F S i e r h n i a i F S S N Y e L Y a dn S n R v m s t 1 E e L s P r V S N Y a L Y h sn Sq R v m a v T E s L s p a V S N Y t L Y a sn Sa R f aa v v E s L a s h V E i Y S I D E af cd 1 t G v r n c r d 1 t d F E p Y S I D 13mF idt r G i n h c i s p e f F E q Y S I DE 1 Fv Dc k G i t a a m s 1 d a F C r e i R a t Vil q r T h L t v G C h q R e q V k s w T g L t m G C r q Re e V r r h T t L v c G A q T K A pit K V V G I A r T K T L A K 1 a n h A a Kk W q r g t g G V V d L s T L A K Sa q w A t Kq W P q f s G V V a L t T L A K 1 c n h A a K tw p a t g G V V a L d n 1 e R q r k L m n a 1 p v d d v w g i G r r i a e n R n r i L k 1 1 g 1 q p v g e v w G v c h d g ari k k L m s i 1 p v a e v w g v G h r t s k k 1 d a m g i k t v 1 d L a d t d i r f r 1 t e k 1 n a 1 g i n t a L q 1 a q a n t a? e k a 1 a t m g i k t v 1 d L a r a d t r i I R K h F n v E R T U R E L r G Ep Ci1 q L I R K n F s V i L E R T U R E L n G s C i s8l I R K t F g V v L E R T U R E L r G E a CppsL EE f a P t K Qle I i c S R S F GlelRli t d y p E E a p P a K Q q I v c S R S F G e R i t d k d E E n PRP a K Q q Ivv S R S F G q R v e t 1 t s NMr Q A i c s y,a a R a MNh Q A v v q y A e R AA E d Ntq Q A v t g f Aa R A A EK L R 9 h Q Y C K L R g E r Q Y C A A E K L R n Elr Q Y C R f i s t F i k T S P f a 1 n e py Y g N s A s R q v t tf v r T S P v k e p c Y s N a A v R v i s v F i r T S P y s v r d t q Y a N q A t v K L 1 t p T e K L p 1 p T e K L t v a T Q D S Rd II n A A c r A L n h Q D S R 3 I I a A A c r A L ln hv v Q D S R t I I q A A q A L a r i W q a, h r Y q R A G V M L g D F f s q v A Q W r e g y Y m A G V M L a D F t p s g i A Q W r e d i a Y a K A G V 1 L a D F : s g k e A Q 1 n L F D d n a P r p g S E q L M t v m D t N p g L F D e i q Pr k n SE k L M k t i D e T N 1 d L F D s a t P s a g S E a L M a v l D g i N a k e G r g t 1 y F A G q G i q q q w q NMk R a q s G k g k v w F A G r G t a p e w q N4 IFRe r r G k n q 1 f F A G q G i d n s f a r R q N L S p ry TT Rs: : R 8 d 1 1 r v k M L S q c Y T T k w R d i p T a ri1 g N L S p d Y T T d w R s i p i a t i k Figure 5. Sequence homology comparisons of the ImpB protein with UmuC and MucB. Amino acid residues which are conserved in all three equivalent genes are shown in bold type and are flanked by bars. Sequences common to either UmuC and ImpB, or MucB and ImpB, are underlined. Sequence data for UmuC and MucB are taken from Perry et al (7). expression was induced by providing T7 RNA polymerase in trans (12). Figure 6 shows the results of an analysis using two plasmid constructs, namely pkg1o, in which the 3.2kb EcoRIBglIH fragment is cloned in pt71, and pdl2, in which the ClaI EcoRV fragment encoding the 3' end of impa and the entire sequence of impb (Figure 1), is cloned in pt72. Comparison of the lanes reveals plasmid coded proteins of approximately lokd, 16kD and 5OkD in the pkg1o track, but suprisingly no plasmid coded products in the pdl2 track. We have previously identified proteins of approximately 1 lkd and 5 1kD as imp products (9), which were at that time suggested to be the products of the impa and imp genes respectively. It is clear from the sequence data that this is unlikely to be the case, and that while the assignment of the 5OkD protein to imp is still valid, the lokd protein is almost certainly the product of the impc gene. The use of the pt7 system allows the visualisation of a 16kD protein not previously seen, which is consistent in size with the potential product of the impa gene. Additional experiments using impc + b(impab) derivatives (containing either the EcoRI ClaI or EcoRIPstI fragments (Figure 1) confirmed unequivocally that the lokd protein is produced not by impa, but by impc (data not shown). Al% ;_ 69 46 30O 14 MN, li 1 2 3 4 5 6 7 Figure 6. Polyacrylamide gel electrophoresis analysis of imp encoded proteins produced from T7 expression plasmids. Lanes 1, 2, and 3 show proteins produced by the plasmid pdl2 (impcab+), lanes 5, 6, and 7 show proteins produced by pkg1o (impc+a+b+). The construction of the two plasmids is described in the main text. Lanes 3 and 7 show proteins prepared from cultures which have not been induced for T7 RNA polymerase. Lanes 2 and 6 contain samples prepared from induced cells, and lanes 1 and 5 contain samples from induced cells to which rifampicin (200Og/ml) has been added to inhibit host mediated transcription. Lane 4 contains molecular weight markers, the sizes of which are indicated in kilodaltons. Arrows indicate the three imp gene products in lane 5, with approximate sizes of lokd (ImpC), 16kD (ImpA) and 47kD (ImpB).

5050 Nucleic Acids. Research, Vol. 18, No. 17 The lack of visible impcoded gene products in the pdl2 track was an unexpected observation, since this plasmid contains an intact impb gene which is known from genetic complementation to be completely functional (Lodwick and Strike, manuscript in preparation). It seems likely that the high level expression of this gene cannot be achieved in this construct, perhaps due to the lack of translational coupling which normally occurs when the impc and impa genes are present. DISCUSSION The DNA sequence of the imp operon reveals a close relationship to the two previously characterised UV protection / mutation operons muc and umu. However, despite the fact that both imp and muc are found on plasmids originally isolated from Salmonella typhimurium, while umu occupies a chromosomal location in E. coli, imp and muc show no greater homology to each other than to umu. Each of these three operons codes for proteins of 16kD and 47kD, which are essential for the UV protection / mutation effect. The high degree of conservation of blocks of amino acids amongst the three 16kD proteins and amongst the three 47kD proteins clearly indicates those regions of the protein essential for function. However, it is also clear that while these systems are closely related, they are quite distinct from each other. The low level of homology between the operons at the DNA level prevents crosshybridisation, and gives a clear indication of the evolutionary divergence which must have taken place from their (presumed) common ancestor. This divergence is also indicated by the failure of the component genes of the three operons to effectively cross complement. There is no cross complementation between the muc and umu component genes (1), and only a very limited complementation between imp and umu (Lodwick and Strike, manuscript in preparation). The major difference between the imp operon and the muc I umu operons however is the presence of a third gene, designated impc, coding for a small protein of MW 9,491, for which no homologue exists in the other two operons. The biological function of this protein is not yet clear, although our initial experiments indicate that it may be involved in the restoration of the induced imp operon to the repressed state, once repair of DNA damage is completed. It does not however share homology with the plasmidcoded psib function, which is reported to have a similar role (22), nor have we been able to detect any significant homology to proteins of known function in the SwissProt data base. We have however demonstrated by the creation of deletions that impc is not absolutely required for the UV protection and mutation effects (Lodwick and Strike, manuscript in preparation). In terms of evolution of the UV protection / mutation functions, it is not possible to say whether the imp operon has acquired this extra sequence, or whether the umu and muc operons have lost it. It is perhaps relevant however that imp like sequences are widely distributed amongst most Incl1, IncB and IncFIV plasmids (Lodwick and Strike, manuscript in preparation), both in plasmids derived from recent clinical infections, and from plasmids isolated before the general use of antibiotics (11). In each of these cases, sequences equivalent to impc are present and have not been lost. The ImpC protein does appear to be toxic to the cell when produced constitutively in large quantities, and also appears to interfere with the SOS induction process (Lodwick and Strike, manuscript in preparation). Experiments are currently in progress to further characterise its biological properties. ACKNOWLEDGEMENTS This work was supported by a project grant from the Medical Research Council to P.S. The excellent technical assistance of Jenni Jones is gratefully acknowledged. We are particularly grateful to Dr John Little for the supply of LexA overproducing strains, for the supply or purified LexA protein, and for help and advice on purification procedures. Thanks are also due to Dr Stan Tabor for kindly supplying the pt7 expression plasmids. REFERENCES 1. Walker,G.C. (1984) Microbiol. Rev., 48, 6093 2. Bridges, B.A. and Woodgate, R. (1984) Mol. Gen. Genet., 1%, 364366 3. Tessman, I. (1985) Proc. Natl. Acad. Sci. USA, 83, 66146618 4. Christensen, J.R., Le Clerc, J.E., Tata, P.V., Christensen, R.B. and Lawrence, C.W. (1988) J. Mol. Biol., 203, 635641 5. Sedgwick, S.G. and Goodwin, P.A. (1985) Proc. Natl. Acad. Sci. USA, 82, 41724176 6. McCann, J., Spingarn, N.E., Kobori, J. and Ames, B.N. (1975) Proc. Natl. Acad. Sci. USA, 72, 979983 7. Perry,K.L., Elledge, S.J., Mitchell, B.B., Marsh, L. and Walker, G.C. (1985) Proc. Natl. Acad. Sci. USA, 82, 43314335 8. Kitagawa, Y., Akaboshi, E., Shinagawa, H., Horii, T., Ogawa, H. and Kato, T. (1985) Proc. Natl. Acad. Sci. USA, 82, 43364340 9. Glazebrook, J.A., Grewal, K.K. and Strike, P. (1986) J. Bacteriol., 168, 251256 10. Carter, P., Bedouelle, H. and Winter, G. (1985) Nucleic Acid Res., 13, 4431 4443 11. Sedgwick, S., Thomas, S.M., Hughes, V.M., Lodwick, D. and Strike, P. (1989) Mol. Gen. Genet., 218, 323329 12. Tabor, S. and Richardson, C.C. (1985) Proc. Natl. Acad. Sci. USA, 82, 10741078 13. Messing, J. and Viera, J. (1982) Gene, 19, 269276 14. Sanger, F., Nicklen, S. and Coulson, A.R. (1977) Proc. Natl. Acad. Sci. USA, 71, 54635467 15. Tabor, S. and Richardson, C.C. (1987) Proc. Natl. Acad. Sci. USA, 84, 4767 4771 16. Biggin, M.D., Gibson, T.J. and Hong, G.F. (1983) Proc. Natl. Acad. Sci. USA, 80, 39633965 17. Schnarr, M., Pouyet, J., GranigerSchnarr, M. and Daune, M. (1985) Biochemistry, 24, 28122818 18. Maniatis, T., Fritsch, E.F. and Sambrook, J. (1982). Molecular Cloning: A Laboratory Manual. Cold Spring Harbor University Press, Cold Spring Harbor. 19. Slilaty, S.N. and Little, J.W. (1987) Proc. Natl. Acad. Sci. USA, 84, 39873991 20. Burckhardt, S.E., Woodgate, R., Scheuermann, R.H. and Echols, H. (1988) Proc. Natl. Acad. Sci. USA, 85, 18111815 21. Shinagawa, H., Iwasaki, H., Kato, T. and Nakata, A. (1988) Proc. Natl. Acad. Sci. USA, 82, 10741078 22. Dutreix,M., Backman,A., Celerier,J., Bagdasarian,M.M., Sommer,S., Bailone,A., Devoret,R., and Bagdasarian,M. Nucleic Acids Res., 16, 10669 10679