arab Gene and Nucleotide Sequence of the arac Gene of Erwinia carotovora

Similar documents
SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

Practical Bioinformatics

SUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA

NSCI Basic Properties of Life and The Biochemistry of Life on Earth

SUPPLEMENTARY DATA - 1 -

Supplemental data. Pommerrenig et al. (2011). Plant Cell /tpc

Advanced topics in bioinformatics

High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm

Biosynthesis of Bacterial Glycogen: Primary Structure of Salmonella typhimurium ADPglucose Synthetase as Deduced from the

Crick s early Hypothesis Revisited

Number-controlled spatial arrangement of gold nanoparticles with

Characterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin

SSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr),

Supplementary Information for

Electronic supplementary material

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Clay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture.

3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies

Supplemental Figure 1.

Proteins: Characteristics and Properties of Amino Acids

Supporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)-

Codon Distribution in Error-Detecting Circular Codes

Protein Threading. Combinatorial optimization approach. Stefan Balev.

The Trigram and other Fundamental Philosophies

Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy

Using an Artificial Regulatory Network to Investigate Neural Computation

Supplementary Information

Supporting Information

Lecture 15: Realities of Genome Assembly Protein Sequencing

Regulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)

Translation. A ribosome, mrna, and trna.

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R

160, and 220 bases, respectively, shorter than pbr322/hag93. (data not shown). The DNA sequence of approximately 100 bases of each

evoglow - express N kit Cat. No.: product information broad host range vectors - gram negative bacteria

Modelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics

evoglow - express N kit distributed by Cat.#: FP product information broad host range vectors - gram negative bacteria

SUPPLEMENTARY INFORMATION

TM1 TM2 TM3 TM4 TM5 TM6 TM bp

Supplemental Table 1. Primers used for cloning and PCR amplification in this study

ydci GTC TGT TTG AAC GCG GGC GAC TGG GCG CGC AAT TAA CGG TGT GTA GGC TGG AGC TGC TTC

The role of the FliD C-terminal domain in pentamer formation and

SUPPLEMENTARY INFORMATION

Objective: You will be able to justify the claim that organisms share many conserved core processes and features.

part 3: analysis of natural selection pressure

Evolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval

Introduction to Molecular Phylogeny

Why do more divergent sequences produce smaller nonsynonymous/synonymous

Aoife McLysaght Dept. of Genetics Trinity College Dublin

Supplementary Information

Diversity of Chlamydia trachomatis Major Outer Membrane

Re- engineering cellular physiology by rewiring high- level global regulatory genes

THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE

Identification of a Locus Involved in the Utilization of Iron by Haemophilus influenzae

FliZ Is a Posttranslational Activator of FlhD 4 C 2 -Dependent Flagellar Gene Expression

Evolutionary Analysis of Viral Genomes

Properties of amino acids in proteins

Amino Acids and Peptides

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell

Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective

Solutions In each case, the chirality center has the R configuration

The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded DNA Sequence Is Self-Designed as a Numerical Whole

Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi

Pathways and Controls of N 2 O Production in Nitritation Anammox Biomass

DNA sequence analysis of the imp UV protection and mutation operon of the plasmid TP110: identification of a third gene

Evolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila

PROTEIN SYNTHESIS INTRO

Sequence Divergence & The Molecular Clock. Sequence Divergence

It is the author's version of the article accepted for publication in the journal "Biosystems" on 03/10/2015.

373 The Evidence of how DNA and the Scriptures have Identical Numeric Signatures

evoglow yeast kit distributed by product information Cat.#: FP-21040

evoglow basic kit product information

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

Cloning, Nucleotide Sequencing, and Expression of the Clostridium perfringens Enterotoxin Gene in Eschenichia coli

Edinburgh Research Explorer

ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line

Translation - Prokaryotes

Supplemental Figure 1. Phenotype of ProRGA:RGAd17 plants under long day

Bacillus subtilis Succinate Dehydrogenase Complex

Chemistry Chapter 22

Genetic code on the dyadic plane

Viewing and Analyzing Proteins, Ligands and their Complexes 2

Timing molecular motion and production with a synthetic transcriptional clock

Near-instant surface-selective fluorogenic protein quantification using sulfonated

From DNA to protein, i.e. the central dogma

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.

Exam III. Please read through each question carefully, and make sure you provide all of the requested information.

Slide 1 / 54. Gene Expression in Eukaryotic cells

7.05 Spring 2004 February 27, Recitation #2

Chapter 16 Lecture. Concepts Of Genetics. Tenth Edition. Regulation of Gene Expression in Prokaryotes

ANALYZING THE DIVERSITY OF A SMALL ANTIBODY MIMIC LIBRARY. Nick Empey. Chapel Hill 2010

Biology 112 Practice Midterm Questions

AtTIL-P91V. AtTIL-P92V. AtTIL-P95V. AtTIL-P98V YFP-HPR

UNIT TWELVE. a, I _,o "' I I I. I I.P. l'o. H-c-c. I ~o I ~ I / H HI oh H...- I II I II 'oh. HO\HO~ I "-oh

Supporting Information. An Electric Single-Molecule Hybridisation Detector for short DNA Fragments

The effects of leader peptide sequence and length on attenuation control of the trp operon of E.coli

Chain-like assembly of gold nanoparticles on artificial DNA templates via Click Chemistry

Insects act as vectors for a number of important diseases of

Molecular Biology, Genetic Engineering & Biotechnology Operons ???

Biochemistry Quiz Review 1I. 1. Of the 20 standard amino acids, only is not optically active. The reason is that its side chain.

Transcription:

JOURNAL OF BACTERIOLOGY, Nov. 1985, p. 717-722 0021-9193/85/110717-06$02.00/0 Copyright 1985, American Society for Microbiology Vol. 164, No. 2 arab Gene and Nucleotide Sequence of the arac Gene of Erwinia carotovora SHAU-PING LEI, HUN-CHI LIN, LAUREL HEFFERNAN, AND GARY WILCOX* Department of Microbiology and Molecular Biology Institute, University of California, Los Angeles, California 90024 Received 7 May 1985/Accepted 7 August 1985 The arab and arac genes of Erwinia carotovora were expressed in Escherichia coli and Salmonella typhimurium. The arab and arac genes in E. coli,, and S. typhimurium were transcribed in divergent directions. In, the arab and arac genes were separated by 3.5 kilobase pairs, whereas in E. coli and S. typhimurium they were separated by 147 base pairs. The nucleotide sequence of the E. carotovora arac gene was determined. The predicted sequence of AraC protein of was 18 and 29 amino acids longer than that of AraC protein of E. coli and S. typhimurium, respectively. The DNA sequence of the arac gene of was 58% homologous to that of E. coli and 59% homologous to that of S. typhimunum, with respect to the common region they share. The predicted amino acid sequence of AraC protein was 57% homologous to that of E. coli and 58% homologous to that of S. typhimurium. The 5' noncoding regions of the arab and ardc genes of had little homology to either of the other two species. The L-arabinose gene-enzyme complex in both Escherichia coli (8, 9, 16) and Salmonella typhimurium (13, 19) has been characterized. The arac gene is a positive and negative regulator of the arabad operon and also a negative regulator of its own synthesis (4, 12, 17, 21). The regulation and organization of the arac and arabad genes in the two species are very similar. The DNA sequences of the arac genes are 82% homologous, and the predicted amino acid sequences of the AraC proteins are 92% conserved (6). Biochemical studies (3, 10) demonstrate that at the DNA level the gram-negative enteric bacterium Erwinia sp. is only 50% homologous to E. coli (10). To initiate a study of the organization and regulation of the ara genes in other members of the family Enterobacteriaceae, we cloned the arab and arac genes from EC. The distance between the arab and arac genes was determined, and the arac gene was sequenced. MATERIALS AND METHODS Strains and plasmids. The E. coli bacterial strains and plasmids used in this study are listed in Table 1. wild-type EC was obtained from A. K. Chatterjee, Kansas State University. Plasmids were introduced into E. coli strains by transformation (7). Plasmid DNA was prepared as described previously (18). Media, chemicals, and enzymes. MacConkey agar base (Difco Laboratories) was supplemented with 1% L-arabinose (Sigma Chemical Co.). Restriction enzymes were purchased from Bethesda Research Laboratories, Inc., and T4 DNA ligase was purchased from Boehringer Mannheim Biochemicals. [a-32p]datp and L-[35S]methionihe were purchased from Amersham, Corp. When required, ampicillin was added to the medium at 50,ug/ml. Plasmid constructions. chromosomal DNA was partially digested with EcoRI and ligated with EcoRIlinearized pbr322. Ligated DNA was introduced into arab or arac mutants of E. coli by transformation. The cells were spread on MacConkey-agar-arabinose-ampicillin plates, and * Corresponding author. 717 TABLE 1. E. coli strains and plasmids Strain or plasmid ara genotype Origin or reference E. coli strainsa HB101 ara-14 1 LA310 A(araB)809 14 LA3 A(araOC)719 11 LA840 arac::lacz Laura Cass, this laboratory MC1061 A(araDABPOC) 5 SB7223 A(araABPOC)744 27 Plasmidsb psh100 arac+b+a' This paper pspl8 arac+ This paper pspl10 arac+ This paper pspl11 arab+ This paper pspl147 arac+ This paper pspl148 arac+ This paper pjb8 15 pbr322 2 phl2 arac+b+a+d+ 19 ptb1 arac+b+ 11 puc13 25 a All the LA strains are derivatives of E. coli K-12 RR1 and contain the ara region from E. coli B/r. SB7223 is also a derivative of E. coli K-12 that contains the ara region from E. coli B/r. b The ara genes of phl2 and ptb1 are from S. typhimurium and E. coli 83/r, respectively. the plates were incubated at 37 C. Colonies that appeared red after 20 h of incubation were scored as complementation positive. The ara genes were also detected by complementation of ara mutants after transformation with the cosmid library of that was prepared previously (18). Construction of BAL 31 asymmetric deletion mutants. Plasmid pspl147 (containing the arac gene) was first digested with EcoRV (Fig. 1). The linearized DNA was then digested with BAL 31, and samples were removed at different time points. After phenol extraction and ethanol precipitation, the DNA was digested with HpaI and ligated with HincII-digested puc13. Ligated DNA was introduced into the E. coli strain containing the A(araOC)719 mutation

718 LEI ET AL. AprTCr ;:lc EcoRI I EcoRI BomHI Ligation (pspl147 AprTcJr EcoRI ElcRV EcoRV E rri CssHJ EcoRV j(corv + EcoRV BssHI EcoRI HpaI s BAL31 tpoi AvaI A#bI EcoRV Sco I /_EoRI pbr322 BomHI AprvoI MVI EcoRV BssHI EcoRI /OfpI ITI Ligation FIG. 1. Plasmid constructions. The thin curved lines indicate vector DNA, and the thick curved lines indicate chromosomal DNA. The curved arrow indicates transcription orientation. by transforniation. The cells were spread on MacConkey aga~-arabinose-ampicillin indicator medium, and the plates were.incubated at 37 C for 20 h. Ot-ganization of the arab and arac genes. Southern blot hybridizations were used to define the organization of the arab and arac genes in. A 1.4-kilobase-pair (kb) DNA' fragment that contains the entire arac gene was isolated from pspl8 (Fig. 1 and Fig. 2) and used as an arac probe. A 1.5-kb EcoRI fragment that contains the arab gene was isolated from pspl1i and used as an arab J. BACTERIOL. probe. Both the arac and arab probes were `nick translated as described previously (22). Plasmids psi4100 (arab and arac), pspl10 (arac), and pspl11 (arab) w'ere digested with various restriction enzymes, and the fragments were separated on a 1% agarose gel. The DNA was bi-blotted (24) onto nitrocellulose paper and hybridized with the arac or arab probes, respectively. DNA sequence analysis. The DNA sequence was determined by the Sanger dideoxy chain-te'rmination method (23). Plasmid pspl8 (Fig. 1) was digested with EcoRI and HindIII and ligated with EcoRI- and HindIII-digested M13mplO or M13mpll to obtain bpth orientations. Nonrandom deletions were kenerated in the M13mpf10 and M13mpll derivatives (20). The DNA sequence beyond the arac gtructural gene anfd the HpaI site was determined with an M13mp8 derivative containing a 900-base-pair taql DNA fragment (Fig. 2). In vitro anaiysis of plasmid-enioded proteins. E. coli SB7223 was used to prepare an S-30 fraction as described previously (27) except that the cells were broken in a tirench pressure cell at 5,000 lb/in2. The in vitro protein synthesis reactions were performed as described previously (19). RESULTS Cloning of arac and arib genes. An library was constructed in pbr322 from a partial EcoRI digestion of total DNA as deecribed in Materials and Methods. Plasmid pspl10 containing the arac,, gene was identified from the library by its ability to-complement an E. coli strain containihg the A(araOC)719 mutation. The insert in pspl10 is 16 kb and contains an internal EcoRI site (Fig. 1). Each of the EcoRI fragments derived from the pspl10 insert was subcloned in pbr322. Neither subclone could complement the arac mutation. This suggested that the EcoRI site was within the arac gene. A 3-kb Aval fragment from pspl10 was subcloned into the AvaI site of pbr322. The resulting plasmid, designated pspl147 (Fig.. 1), complemented the A(araOC)719 mutation (Table 2). Plasmid pspl147 could also repress lacz expression when the plasmid existed in trans to an E. coli arac::lacz chromosomal fusion strain (Table 2). An arab-containing plasmid (pspl11) was identified from the EcoRI library by its ability to complement an E. coli strain containing the A(araB)809 mutation (Table 2). Plasmid pspl11 contained a 3.5-kb EcoRI insert. Plasmid psh100 was identified from a cosmid library of E. carotovora (18) by its ability to complement the E. coli A(araOC)719 mutation. Subsequently, plasmid psh100 was also found to complement the A(araB)809 mutation (Table 2). The DNA insert in psh100 is approximately 35 kb. However, none of the plasmids from either library was able TABLE 2. Identification and chatacterization of the cldned ara genes of a Ability of LA840 (arac::i1acz) to Ability of strain [genotype] to utilize arabinose utilize: Plasmid MC1061 LA3 LA310 A(araDABPOC) tl(araoc)719 A(araB)809 Arabinose Lactose No plasmid - - + pspl11 (arab) + - + pspl147 (arac) - + + psh100 (arab arac) - + + NDb ND a The cells transformed with the plasmids were plated on MacConkey agar-arabinose-ampicillin or MacConkey agar-lactose-ampicillin plates. Symbols: + ability to utilize arabinose or lactose; -, inability to utilize these carbohydrates. b ND, Not determined.

VOL. 164, 1985 arab AND arac GENES OF E. CAROTOVORA 719 uruc OIarB OQB_probe are_ I probei E carotovlria ci cr c cr m V - D1 111 I II ZI I I(IcT I EccoliDovA B A' FIG. 2. Somparison fragment wa,s'derived from plasm,id psh100. The DNA fra'gment derived fr,om B of the organization of the ara operon and arac gene of, E. coli, and S. typhimurium. The relative position of each gene is indicated by A, B, C, and D; the transcription orientation is indicated by an arrow. The 21-kb DNA pspl8 and used as the arac probe for Southern blot hybridizations is indicated. 1~~~~~~~~~~~~~~~~~~~~~~~1k BamHI and HindIII sites on the vector psfil8 (tfig. 1) were used to excise the probe. The DNA fragment used a,s thetarab probe was from pspl11. The arab probe is t'he leftmqst 3..S-kb EeoRI fragment. The restriction map of S. typhimurium is from Lin et al. (19), and that of E. coli is from Greenfield et al. (11). F.oGlyai GCCTCGCGCATGACCAGCGGCC AGCCtiATGACGTCCATCGGGTATGiAGCTGATCGTTATTTCTGCCTGTGTGCTGGGTGGGGTATCGGTAAAGGCGGGATCGGTAAAATTTCCTACGTGGTGGCTGGCGTACTGAT TTTAGGTACGGTCGAGAACGCGATGAACCTGCTAAATATCTCGCCGTTCTCTCAGTATGTGGTACGTGGCCTGATTCTGCTGGCGGCGGTTATCT,TCGACCGCTACAAACAACTG GCAAAACGGTATAAATtCCTGTG(G- CAGGTAAGAGCTTATCCCAGTAGGCGTTATTGGTGCAGCCAGTTTGGACACGGACAGCGCGCAGAAACCGGAGCGTACACGTAGT ACGTGAGGATTTTGuAGCACTGCCCAGGTTCAAAATGGCAAATAAAATAGTCCTAATGGGATGGCTCTAAGTcATTCTGCCGCGTCTTATCATTACTAGCGTGTACCggrCCTG 1 ~~~~~~~~~~~~~~50 ATG TA'r CAC CGT ATG GCG CAT GAA TCT CAG CCT AAT CCA CTG TTG CCG GGA TAT TCG TTC AAC GCT TAC CTC GTT GCA GGT TTG ACG l4et Tyr Hi s Arj AXet Ala His Glu Ser Gln Pro Asn Pro Leu Leu Pro Gly Tyr Saer Phe Asn Ala Tyr Leu Val Ala Gly Leu Thr 100 150. CCG ATT CTG GCA GAA GGG CCA CTC GAT TTC TTT Arc GAT CGC CCT GAC GGC ATG AAA GGC TAC-ATC ATC AAT CTC ACC ATG AAA GGA Pro Ile Leu Ala Glu Gly Pro Leu Asp Phe Phe Ile Asp Arg Pro Asp Gly Met Lys Gly Tyr Ile Ile Asn Leu Thr Met Lys Gly 200 250 CAu GGC CAG ATT TTT GAT GuT GAT GMA ACT TTT TTC TGT AAT CCC GGC GAT CTA CTG TTG TTT CCG CCG AAA TCG 'ACG CAT TTT TAT lnf tlh eorg Asp Gly Asp Glu Thr Phe Phe Cys Asn Pro Gly Asp Leu Leu Leu Phe Pro Pro Lys Ser Thr His Phe Tyr.59 ~~~~~~~~~300 higg CGT CCa AtC TCA AGC GAC TGH TGlil TAT CAT CGC TGh GTC TAT TTT CGA CCA CGC GCC TAT TGG GCA GAT TGG CTG GAG TGG CAT aelpy Arjf er ProSer Ser Asp Cys Trp Tyr His Arg Trp Val Tyr Phe Arg Pro Arg Ala Tyr Trp Ala AspTrip Leu Glu Trp His 88 350 400 A; AAA AGC AGC GTiG ATT GGG CGC ATG AGT TTG CCG AAT MC CAG TTG TTA TTG GAA TTC GAC AGG CTG TTT GCC MT ATC GAG CAG Tnr Lys r Ser er Gly Ile Gly Arg letser Leu Pro Asn Asn Gln Leu Leu Leu Glu Phe Asp A r Leu VAla Asn Ile Glu Gln 450 500 ACCTCAG C AT TCC GGCGi CGC TTC TCG GAG TTAr CTG GGT ATG CC CTG CTG GMCGA CTG TTG CTC AGA GCG ATG GAA GAA GAT CCA Tnr Gl n Arj Ser Gly Arj ArLPhe Ser Glu Glu Leu Gly Met Asn Leu Leu Glu Arg TyrLeLeu Arg i Ala Met Glu Glu Asp Pro I 50 6500 CAG AGC CCA CAG AAA ATTAT GC CA GC GTT TT GT MCCC G GAT ACC AGT TTT CTG GCT GGA GAG TTG CGT ATT GAC -Gln Ser Pro Gln Lys AIle Get Asp Pro ArT Val Ile Glu Ala Cys Gln Phe Ile Thr Ser Asn Leu Ala Gly Glu Leu Arg Ile Asp 173 650 GCCA GTG GCG G CAC GCT TCC CTG TCG CCG TCG CGCTG GTG CTT CTG TTC CGT GAA CAG GTG GGT ATT GAT T TTG CGC TGG CGT GIu Val Ala Arg His Val Cys Alsa Leu Ser Pro Ser Arg Leu 1is Leu Phe Arg AluGln Val Gly Ile Asn Ile Leu Arg Trp Arg 202 700 750 GAA GAT CAG CGC GT( ATT CGC GCC AAG CTG TT G CAG ACG ACG CAG GAA TCC ATC GCC AT ATT GGT CGC GTG MTG GGG TAT GAC Tlu Asp Gln Arg Val Ile Arg Ala Lys Leu Leu Ler Asn Asn Gin lu GirI heasp,a,sn IlePe G AlaVas lie Gly Tyr Asp 800 850 GAT CAG CTC TAT TTC TCA CGC GTA TTC CGTAAG CGA GTC GGT GTC AGC CCT AGC GAT TTT CGT CGC CGC AGC AGC GAA ATC MC TAT AspGT n Leu Tyr Pne Ser Arg Val Phe Arg Lys Arg Val Gly Val Ser Pro Ser PAr Asp ALrL e Are Arg Ser Ser Glu Ile Asn Tyr 260 900 930 CCA GCA GCC AAA ACG CTG CCC GTC GCG TGG GAG CAG ATA CCC CAT GT ATG AGC AGT TAA CGGGCACAAAA iro Ala Ala Lys Thr Leu Pro Val Ale Trp Gly Glu Gln Ile Pro PHes Ala Val Ser Ser 289 FIG. 3. DNA sequence and predicted amino acid sequence of the arac gene of. The DNA sequence of the arac gene was determined from a 1.4-kb DNA fragment of pspl8 by a method described previously (20). Numbering starts from the first nucleotide of the first methionine coding for AraC protein. The GGAG sequence in the box indicates the presumed ribosome-binding site.

720 LEI ET AL. Erwlin/a arac- Salmonella a roac - co C\i N% _j J I UC) F- m~ Q a. qmw -r co/i orac FIG. 4. In vitro analysis of the molecular weight of AraC protein of. Autoradiograph of the protein synthesized in vitro. Each S-30 reaction mixture (6,ul) prepared as described in Materials and Methods was mixed with loading buffer, heated at 100 C for 5 min, and loaded on a 10% sodium dodecyl sulfatepolyacrylamide (acrylamide:bis 30:0.8) gel. Plasmid phl2 = contains the arac gene from S. typhimurium which codes for a protein of 281 amino acids with a molecular weight of 32,034 (6). This plasmid codes for P-lactamase. Plasmid pspl148 contains the arac gene from and does not code for 3-lactamase. Plasmid ptb1 contains the arac gene from E. coli which codes for a protein of 292 amino acids and has a molecular weight of 33,315 (11). This plasmid does not produce P-lactamase. The location of AraC protein of S. typhimurium in this gel system was determined previously (19). The location of each AraC protein is indicated by the horizontal lines. to complement the A(araDABPOC) mutation in strain MC1061, indicating that they do not contain all of araa and arad (Table 2). Organization and restriction map of the arab and arac genes. To determine the location of the arac and arab genes on plasmid psh100, Southern blot hybridizations were performed (26). The restriction maps of the arab and arac genes are shown in Fig. 2. A 6-kb BssHII fragmnent in psh100 was identified that hybridized with both the arab probe (the 3.5-kb EcoRI fragment of pspl11) and the arac probe (the 1.4-kb HindIII-BamHI fragment of pspl8) (Fig. 2). The locations of the BssHII sites were identified in both pspl147 and pspl11. The location and orientation of the arab and araa genes were determined by DNA sequence analysis of the termini of the 3.5-kb EcoRI fragment of pspl11 (data not shown). By comparing the sequence of each end of this fragment to the sequence of the arab and araa genes of S. typhimurium, regions of homology in arab (base pairs 1 to 90 from the beginning of the arab gene) and araa (base pairs 1099 to 1449 from the beginning of the araa gene) were found. The sequence of part of the 2-kb EcoRI fragment of pspl10 was homologous to the arac gene of S. J. BACTERIOL. typhimurium (see below). From the location of the BssHII 5ites, the orientation of the arab gene in pspl11, and the sequence homology, we conclude that there is about 3.5 kb between the arab and arac genes in (Fig. 2) and that they are transcribed in opposite directions. DNA sequence analysis of the arac gene. The complementation analysis of plasmid pspl10 and its EcoRI subclones suggested that the EcoRI site was within the arac gene. The DNA sequence from the EcoRI site towards the BssHII site was determined and was homologous to the arac gene of E. coli and S. typhimurium, confirming that the EcoRI site was within arac. DNA sequence analysis was carried out in both directions from the EcoRI site. The DNA sequence of the arac gene of and the predicted amino acid sequence of AraC protein are shown in Fig. 3. A presumed ribosome-binding site, GGAG, was followed by an open reading frame that is homologous to the arac gene from E. coli and S. typhimurium. The predicted AraC protein contains 310 amino acid residues and has a calculated molecular weight of 35,144. In vitro analysis of plasmid-endoded protein. The size of E. carotovora AraC protein made in vitro was estimated by analysis on sodium dodecyl sulfate-polyacrylamide gels. Attempts to identify AraC protein with plasmid pspl147 as the DNA template were unsuccessful, probably because the location of AraC protein was obscured by,b-lactamase. Therefore, the arac-containing AvaI fragment of pspl147 (Fig. 1) was blunt ended with the large fragment of DNA polymerase I and then cloned into the ScaI site of pbr322, thus, destroying the bla gene. The resulting plasmid, designated pspl148 (Fig. 1), was used to direct protein synthesis. Subsequent gel electrophoresis and autoradiography allowed us to identify AraC protein and determine that its molecular weight was 36,000 (Fig. 4). DISCUSSION The arab and arac genes of were identified from both the pbr322 and cosmid libraries. The location of the arac gene in pspl10 indicates that approximately 1.5 kb separates arac from the EcoRI site immediately to the left of arac in Fig. 1. The observation that expression of the arab gene in pspl11 requires the arac regulatory protein indicates that pspl11 contains a functional promoter for the arab gene. In conjunction with the results of the Southern blot hybridization of psh100 with arab or arac probes, we concluded that the arac and arab genes in are separated by approximately 3.5 kb. A comparison of the DNA sequence of the arac and part of the arab genes of E. carotovora with the DNA sequence of S. typhimurium (6, 19) indicates that the orientation of arac relative to arab genes is identical in all three organisms. Thus, the genes must also be divergently transcribed in (Fig. 2). The approximately 3.5 kb that separates the arac and arab genes in is quite different from the situation for E. coli and S. typhimuriurh in which the genes are adjacent to one another and share a common controlling region of 147 base pairs. The arac gene of E, carotovora and its flanking regions were sequenced. An open reading frame that followed a presumed ribosome-binding site was identified. This open reading frame codes for a protein of 310 amino acids that is homologous to the predicted AraC protein sequence of S. typhimurium and E. coli. The methionine at position five corresponds to the N-terminal anmino acid of the AraC protein of both E. coli and S. typhimurium. We tentatively conclude that the AraC protein of starts at the

VOL. 164,1985 T. coli S. typhimurium E. F. carotovora coli S. typhimurium 'E. coli 9. typhiimurium T. coli 9. typhimurium!. coli l. typhimurium arab AND arac GENES OF E. CAROTOVORA 721 MY HR MA H ES Q P N AP L L PG Y SF NAY LV AG L T PI LA E G P L D F F ID R PD GM KG Y II R: :A:A QG:D: E :::I::H:: ::E:N:Y:::G:I:H:H:L:A:RE:L NLTM KGQGQIFDGD E TFFCNPGDLLLFPP KS THFYGRSPSSDCWYHRWVYFRP VIR: : :VVK N QGR E: V R: ::I:::::G E I :H: ::H:EARE::::: :::IR:E:V:NNNG:Q:V:R:::I:::::GEI:H:::H:DASE:::Q:::::: RAYWADWLEWHTKSSGIGR M S LPNNQLLLEFDRLFANA IEQTQRSGRRFSE E L : : : :HE: : N : P S I F A N T :F F R :D E A H Q P H: SD: : G Q I :N A G :GE: : Y: :L: :QE: :T: P: IFAQT:FFR A:DEARQPH: SE: :GQI :SAG:GE: A:Y: :L: GM N L L E R L L L RAM E E D P Q S P Q KIM D PRV I E A C Q FITS N LA AG E L RID E VAR H V AI::::Q ::::R :AINE :LHPP ::N:: R : : : : Y :SDH ::DSNFD :AS:: Q AI::::Q ::::R :AVINE :LHPP ::S:: RD : : : Y :SDH ::DSHFD : AS:: Q CLSPSRLAHLFREQVGINILRWREDQRVIRA K LLLQTTQESIANIGRVVGYDD :::: Q :L::SV:S::::::IS Q :::::S:: R M P ::TV:: N ::F ::::S::::Q :L : :SV :S::::::ISQ :::::S:: RMP ::TV: :N::F Q L Y F S R V F R K R V G V S P S D F R R R S S E I N Y P A A K T L P V A W G E Q I P H A V S S F. coli :::::::: K:CT:A:::E::AGCE: K V NDV:V K : S-------------------------- Ltyphimurium. :::::::: K: C T: A::: E:: A G C E------------------------------------------------ FIG. 5. Comparison of the amino acid sequences of the arac genes of, E. coli, and S. typhimurium. Amino acids are given in the single letter code: A, alanine; C, cysteine; D, aspartic acid; E, glutamic acid; F, phenylalanine; G, glycine; H, histidine; I, isoleucine; K, lysine; L, leucine; M, methionine; N, asparagine; P, proline; Q, glutamine; R, arginine; S, serine; T, threonine; V, valine; W, tryptophan; and Y, tyrosine. Symbols: A, deletion; :, identical amino acid; -, no comparable region because initiation and termination are at different locations. first methionine of the open reading frame, because there is no Shine-Dalgarno sequence upstream from the methionine at position five. The predicted amino acid sequences of the AraC protein from E. coli, S. typhimurium, and are com- TABLE 3. Codon usage for the arac genes of, E. coli, and S. typhimuriuma AA Codon EC E S AA Codon EC E S Phe TTC 8 6 6 Ala GCG 5 6 7 TIT 9 11 11 GCT 3 3 1 Leu CTG 16 13 13 GCA 4 2 1 TTG 9 5 7 GCC 6 10 10 CTC 5 5 2 Tyr TAT 9 5 9 CTA 1 0 2 TAC 2 5 1 TTA 2 4 3 His CAC 2 4 3 CTT 0 4 3 CAT 6 8 8 Ile ATC 6 6 9 Arg CGT 8 5 4 ATT 11 8 6 CGC 13 10 9 ATA 1 2 2 CGA 3 3 1 Met ATG 8 5 5 AGG 1 0 1 Val GOT 2 4 3 AGA 1 0 1 GTC 4 4 5 CGG 3 4 5 GTA 1 3 3 Lys AAA 6 5 3 GTG 7 4 1 AAG 2 2 1 Ser TCC 2 0 1 Asp GAC 6 5 5 TCT 1 0 1 GAT 10 9 9 TCG 5 6 3 Glu GAA 14 9 6 AGC 10 10 12 GAG 5 9 10 TCA 2 3 2 Cys TGT 2 2 2 AGT 3 0 1 TGC 2 3 3 Pro CCT 3 1 3 Trp TGG 7 6 6 CCA 6 2 1 Gly GGA 4 4 8 CCG 8 13 12 GGT 6 9 1 CCC 3 1 1 GGC 4 0 6 Thr ACG 6 2 3 GGG 6 8 6 ACC 3 5 4 Asn AAT 8 10 9 ACT 1 0 1 AAC 4 4 1 ACA 0 0 2 End TAA 1 1 1 Gln CAG 15 9 14 TAG 0 0 0 CAA 0 6 2 TGA 0 0 0 a AA, Amino acid; EC, ; E, E. coli; and S, S. typhimurium. pared in Fig. 5. In the common regions, 57% of the sequences are conserved between and E. coli, compared with 58% between and S. typhimurium. The regions which had more differences are located at amino acids 56 to 72, 90 to 95, 114 to 153, 172 to 180, and 191 to 204. The sequence between amino acids 114 and 153 is especially variable. Since there is very little sequence homology and two deletions and one insertion, perhaps this region is not important for AraC function except as a spacer. However, the regions between amino acids 13 and 55, 73 and 89, 96 and 113, 154 and 174, 181 and 190, and especially 205 and 283 are conserved in these three species. This is consistent with an earlier report (6) indicating sequence conservation in the amino- and carboxy-terminal regions of the AraC proteins from E. coli and S. typhimurium. In addition, several non-self-regulating arac mutants have been identified in S.typhimurium. These mutations are located in the carboxy-terminal region of the AraC protein and are at residues which are conserved in all three organisms (P. Clarke, J.-H. Lee, K. Burke, and G. Wilcox, unpublished data). Thus, this region might contain sequences required for DNA binding. The structures of the ara genes in different organisms were compared. The 5' noncoding regions of the arac gene in both E. coli and S. typhimurium were highly conserved. Surprisingly, in the DNA sequence upstream of the predicted translation initiation site has very little homology to either E. coli or S. typhimurium sequences. However, the arac gene and part of the arab and araa (S.-P. Lei, unpublished result) genes are about 56% homologous between and S. typhimurium. In addition, the codon usage in the arac genes of these three species is not very different except that in there is no CTT codon used for leucine, and the AGT and the CCA codons are more frequently used for serine and proline, respectively (Table 3). The predicted amino terminus of the AraC protein from is four amino acid residues longer than that from E. coli and S. typhimurium. The predicted carboxy terminus of the AraC protein from E. carotovora is 14 and 25 amino acid residues longer than that from E. coli and S. typhimurium, respectively.

722 LEI ET AL. Although the degree of heterogeneity of the arac DNA sequences is greater between and E. coli than between E. coli and S. typhimurium, the ara genes from E. carotovora can be expressed in E. coli (Table 2) and S. typhimurium (data not shown). The results shown in Table 2 indicate that the AraC regulatory protein of has activator function on the arabad operon and represses arac expression. Thus, the regulatory functions of AraC protein remain the same, even though the degree of heterogeneity of the DNA sequence of the arac gene is high. ACKNOWLEDGMENTS We thank Patrick Clarke for providing the S-30 used in the in vitro analysis and for discussion of his unpublished data. We thank Lori Stoltzfus, Laura Cass, and Patrick Clarke for comments on the manuscript. This research was supported by Public Health Service grants GM 30491 and GM 30896 from the National Institute of General Medical Sciences. LITERATURE CITED 1. Bolivar, F., and K. Backman. 1979. Plasmids of Escherichia coli as cloning vectors. Methods Enzymol. 68:245-267. 2. Bolivar, F., R. L. Rodriguez, P. J. Greene, M. C. Betlach, H. L. Heyneker, H. W. Boyer, J. H. Crosa, and S. Falkow. 1977. Construction and characterization of new cloning vehicles. II. A multipurpose cloning system. Gene 2:95-113. 3. Brenner, D. J. 1973. Deoxyribonucleic acid reassociation in the taxonomy of enteric bacteria. Int. J. Syst. Bacteriol. 23: 298-307. 4. Casadaban, M. 1976. Regulation of the regulatory gene for the arabinose pathway, arac. J. Mol. Biol. 104:557-566. 5. Cdaban, M. J., and S. N. Cohen. 1980. Analysis of gene control signals by DNA fusion and cloning in Escherichia coli. J. Mol. Biol. 138:179-207. 6. Clarke, P., H. C. Lin, and G. Wilcox. 1982. The nucleotide sequence of the arac regulatory gene in Salmonella typhimurium LT2. Gene 18:157-163. 7. Cohen, S. N., A. C. Y. Chang, and L. Hen. 1972. Nonchromosomal antibiotic resistance in bacteria: genetic transformation of Escherichia coli by R-factor DNA. Proc. Natl. Acad. Sci. USA 69:2110-2119. 8. Englesberg, E. 1971. Regulation in the L-arabinose system, p. 257-295. In H. J. Vogel (ed.), Metabolic regulation, vol 5. Academic Press, Inc., New York. 9. Englesberg, E., and G. Wilcox. 1974. Regulation: positive control. Annu. Rev. Genet. 8:219-242. 10. Filer, D., R. Dhar, and A. V. Furano. 1981. The conservation of DNA sequences over very long periods of evolutionary time. Eur. J. Biochem. 120:69-77. 11. Greenfield, L., T. Boone, and G. Wilcox. 1978. DNA sequence of the arabad promoter in Escherichia coli B/r. Proc. Natl. Acad. J. BACTERIOL. Sci. USA 75:4724-4728. 12. Hendrickson, W., and R. Schleif. 1985. A dimer of AraC protein contacts three adjacent major groove regions of the arai DNA site. Proc. Natl. Acad. Sci. USA 82:3129-3133. 13. Horwitz, A. H., L. Heffernan, C. Morandi, J. H. Lee, J. Timko, and G. Wilcox. 1981. DNA sequence of the arabad-arac controlling site region in Salmonella typhimurium LT2. Gene 14:309-319. 14. Horwitz, A. H., C. Morandi, and G. Wilcox. 1980. Deoxyribonucleic acid sequence of arabad promoter mutants of Escherichia coli. J. Bacteriol. 142:659-667. 15. Ish-Horowicz, D., and J. F. Burke. 1981. Rapid and efficient cosmid cloning. Nucleic. Acids Res. 9:2989-2998. 16. Lee, N. 1978. Molecular aspects of ara regulation, p. 389-409. In J. H. Miller and W. S. Reznikoff (ed.), The operon. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. 17. Lee, N. L., W. 0. Gielow, and R. G. Wailace. 1981. Mechanism of arac autoregulation and the domains of two overlapping promoters, P, and PBAD, in the L-arabinose regulatory region of Escherichia coli. Proc. Natl. Acad. Sci. USA 78:752-756. 18. Lei, S.-P, H.-C. Lin, L. Heffernan, and G. Wilcox. 1985. Cloning of the pectate lyase genes from Erwinia carotovora and their expression in Escherichia coli. Gene 35:63-70. 19. Lin, H. -C., S. -P. Lei, and G. Wilcox. 1985. The arabad operon from Salmonella typhimurium LT2. I. Nucleotide sequence of arab and primary structure of its product, ribulokinase. Gene 34:111-122. 20. Lin, H.-C, S.-P Lei, and G. Wilcox. 1985. An improved DNA sequencing strategy. Anal. Biochem. 147:114-119. 21. Miyada, C. G., L. Stoltzfus, and G. Wilcox. 1984. Regulation of the arac gene of Escherichia coli: catabolite repression, autoregulation, and effect on arabad expression. Proc. Natl. Acad. Sci. USA 81:4120-4124. 22. Rigby, P. W. J., M. Dieckmann, C. Rhodes, and P. Berg. 1977. Labeling deoxyribonucleic acid to high specific activity in vitro by nick translation with DNA polymerase I. J. Mol. Biol. 113:237-251. 23. Sanger, F., S. Nicklen, and A. R. Coulson. 1977. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74:5463-5467. 24. Smith, G. E., and M. D. Summers. 1980. The bidirectional transfer of DNA and RNA to nitrocellulose or diazobenzyloxymethyl paper. Anal. Biochem. 109:123-129. 25. Vieira, J., and J. Messing. 1982. The puc plasmids, an M13mp7-derived system for insertion mutagenesis and sequencing with synthetic universal primer. Gene 19:259-268. 26. Wahl, G. M., M. Stern, and G. R. Stank. 1979. Efficient transfer of large DNA fragments from agarose gels to diazobenzyloxymethyl-dextran sulfate. Proc. Natl. Acad. Sci. USA 76:3683-3687. 27. Wilcox, G., P. Meuris, R. Bass, and E. Englesberg. 1974. Regulation of the L-arabinose operon BAD in vitro. J. Biol. Chem. 249:2946-2952.