We still consider the E. coli genome as a fairly typical bacterial genome, and given the extensive information available about this organism and it's lifestyle, the E. coli genome is a useful point of departure for subsequent discussion of prokaryotic genome diversity. Comparative Size Distribution of Prokaryotic Genomes Physical Form of the Genome vs. the Genetic Map: The circularity of the E. coli genetic map reflects the physical form of the DNA genome. All of the genes in the "core" genome of E. coli are coded in a single double-stranded DNA molecule that has no ends. (Contrast this with the linear genetic maps typical of eukaryotic chromosomes.) Although we sometimes refer to the DNA as being "circular", this distorts physical reality. If the genome were laid out in a perfect circle, the circumference would be about 1 mm, at least several hundred times the dimensions of an E. coli cell. Clearly the DNA is not in a true circle while inside a cell, but is highly folded and compacted. So, while the sophisticated E. coli geneticist refers to the genetic map as circular, they describe the genomic DNA as being "covalently continuous". Vital Statistics Derived from Complete Genome Sequencing (for E. coli MG1655) The classic E. coli genetic map is arbitrarily divided into 100 "minutes", corresponding to 4,639 kbp of DNA containing 4,288 ORF's. The average size of an ORF is close to 1 kb. Therefore, approximately 90% of the DNA in the genome is coding sequence. Most genes are "single copy". Only a few very highly expressed genes are present in multiple, functionally equivalent copies. Less than half of these genes had been discovered by the techniques of "classical" bacterial genetics when the complete genome sequence was published 3. Page 1 of 8
Functional Distribution of E. coli ORF's FUNCTIONAL CATEGORY # Proteins % DNA Replication, Recombination, Repair 115 2.7 Regulatory Proteins 133 3.1 Enzymes for Cell Structural Components 182 4.2 Translation 182 4.2 Physiological Responses to Environment 188 4.4 Energy Metabolism 243 5.7 Other Enzymes of Intermediary Metabolism 318 7.4 Biosynthetic Enzymes 340 7.9 Transport Proteins 427 10.0 Other known 528 12.3 Unknown 1,632 38.1 TOTAL 4,288 100.0 Core Genome vs. Pan Genome The variable presence of lysogenic viruses, plasmids, transposable elements, all of which are capable of horizontal gene transfer into other strain lineages, has led to the concept that each individual strain of a bacterial species shares a common or "core" array of genes with all other strains of that species. However, each strain will have a unique and variable set of adjunct genetic sequences, either in the bacterial chromosome itself, or in extrachromosomal elements such as plasmids. These "adjunct" genetic sequences are referred to collectively as the "pan" genome. As a generalization, core genomic sequences are not subject to promiscuous horizontal gene transfer, and are frequently essential for the basic function of the cell. The genes for DNA Polymerase III, and for ribosomal RNA would be considered part of the core genome of an E. coli strain. The pan genome consists of sequences that a strain may have acquired by horizontal gene transfer. These sequences may be essential for the cell under certain special circumstances, but are not involved with basic cell functions. A plasmid carrying antibiotic resistance genes would be considered part of the pan genome. "Chromosomal Islands" are relatively large, contiguous tracts of chromosomal genes that have evidently been acquired from another organism by lateral (horizontal) gene transfer. They are also considered part of the pan genome. The virulence of several pathogenic E. coli strains is associated with genes in chromosomal islands; in which case they are sometimes referred to as "pathogenicity islands". Characteristics that allow recognition of a chromosomal island include: i.) the set of contiguous genes is not uniformly present in closely related strains. ii.) the %GC of the island is different than the genome as a whole. Page 2 of 8
iii.) the pattern of synonymous codon preference is different from the majority of genes in the genome. iv.) the DNA sequences at the island boundaries are often recognizably similar to sequences found in lysogenic virus genomes or in transposable elements. Operons and Regulons Operons are contiguous clusters of several related genes whose expression is coordinately regulated by transcriptional regulation of a specific polycistronic mrna. The classic example is the Lac Operon. Regulons are non-contiguous related genes whose expression is coordinately regulated even though they are transcribed from independent onto multiple mrna's. The classic example is the heat shock regulon. Replication Strategy Bidirectional replication is initiated from a unique origin (oric at 84') with the terminus directly opposite the origin. Under conditions of rapid growth bacteria uses a strategy called dichotomous replication. This means that new cycles of DNA replication begin before the previous cycle is complete, allowing for a minimum cell doubling time (15 min.) that is less than the minimum time required to replicate the genome (60 min.). At rapid growth rates (doubling time less than 30 min.) dichotomous replication means that the copy number of genes located near the origin of replication is enriched relative to genes near the terminus. This is why we often observe that highly expressed genes are preferentially located near the origin of replication to increase the relative copy number. Bacterial genome exhibiting DICHOTOMOUS REPLICATION.!! Four origins of replication shown as green dots.! 6 replication forks shown by red arrows Another interesting aspect of genome organization observed in E. coli is that highly expressed genes are usually transcribed in the same direction that they Replication Terminus are replicated. This is thought to mitigate conflicts between RNA polymerase (transcription) and replication complexes for access to the same allowing faster replication. Page 3 of 8
Transcription-Translation Coupling The lack of a nuclear compartment in prokaryotes leads to one of the most important molecular differences between prokaryotes and eukaryotes, namely, in prokaryotes transcription and translation of protein coding genes is coupled (simultaneous). In other words, ribosomes are translating the 5' end of a mrna before the 3' end of the mrna has been transcribed. Coupling of transcription to translation reduces opportunity for mrna processing (intron splicing, etc.). Introns are exceedingly rare in the bacterial genomes sequenced so far. There are no introns in in the E. coli genome. The only transcripts subject to significant processing are those leading to functional rrnas and trnas. Horizontal Gene Transfer Bacterial do not have a sexual life cycle; neither meiosis nor fertilization have ever been documented. In fact, they don't even carry out mitotic cell divisions in the sense that they lack centrioles, centrosomes, and the spindle apparatus used by most eukaryotes for chromosome segregation. Genetic diversity leading to rapid genomic evolution relies instead on 3 primary mechanisms of horizontal gene transfer: 1. Plasmid-Mediated Conjugation 2. Specialized Transduction and Generalized Transduction mediated by bacterial viruses (bacteriophage). 3. Transformation; the exchange of DNA through uptake of free DNA from solution Transposable Elements Transposable Elements are DNA sequences that are capable of mediating their own movement (transposition) to new locations within the genome they inhabit, or to other genetic elements present in the same cell. Barabara McClintock was awarded a Nobel Prize for her pioneering discovery of transposable elements in the genome of maize. Transposable elements of various types are widespread in genomes of eukaryotes and bacteria. In bacteria, transposable elements can generally be assigned to one of two major types, "Insertion Sequences (IS)" and "Composite Transposons". In practice, composite transposons are typically referred to simply as "transposons". Insertion sequences (IS's) are smaller (1-2 kb) transposable elements whose only genes are directly related to promotion and regulation of their transposition, typically the gene for the socalled transposase enzyme. IS elements are characterized by short, terminal, inverted repeat sequences with the ORF or ORF's in between. They are normal constituents of many bacterial chromosomes and plasmids. Page 4 of 8
A PRIMER of E. coli GENETICS and GENOMICS 3/17/11 Composite transposons generally consist of two copies of the same IS element flanking variable amounts of other DNA sequences coding for one or several genes with diverse functions. The entire transposon moves as a single unit. The best known transposons are those which were discovered as parts of antibiotic resistance plasmids. The diagram below compares the typical structure of an IS element with the transposon Tn5. Tn5 carries 3 antibiotic resistance genes sandwiched between 2 copies of IS50. Only IS elements will be discussed below. Transposable elements are a game changer in bacterial genomes. They participate in a bewildering array of molecular events that alter the genomes which they inhabit. The most important of these are: Transposition IS movement and insertion at a different location in the same DNA molecule, or in different molecule in the cell. The transposition process is often accompanied by replication of the IS (replicative transposition), leading to an increase in the copy # of the IS. The diagram below shows replicative transposition of an IS. Insertional Inactivation Insertion of an IS within a coding sequence generally leads to the loss of gene function (null mutation). Homologous Recombination Multiple copies of the same IS in the same cell are substrates for homologous recombination events that may lead to DNA deletions, sequence inversions, or fusion of separate DNA molecules. For example, homologous recombination between copies of the same IS element in a conjugal Page 5 of 8
plasmid and the bacterial chromosome leads to formation of Hfr strains, as shown below. IS F + IS IS Hfr IS The promiscuity of transpositional and recombinational events associated with Itransposable elements unlocks the Pandora's Box of genome plasticity for bacterial chromosomes and plasmids in which they are found. In fact, the K-12 laboratory strains of E. coli show considerable variability in the number and location of transposable elements in their genomes due to transposition events that have occurred since the parent strain was first isolated in 1922. History of E. coli K-12 Laboratory Strains If we go directly to nature (i.e. the wastewater plant, people with genitourinary tract infections, cattle feedlots, etc.) as a source of E. coli strains we have no difficulty isolating a genetically diverse array of genotypes that fall under the technical definition of E. coli. This can be interesting and productive from the perspective of clinical microbiology, ecology and evolution. Otherwise, if we are using E. coli as a model organism, then it is customary to use the same welldocumented strain as other investigators, so that results from different labs can be readily compared. Genetic studies with E. coli have traditionally been conducted largely with descendants of a specific E.coli isolate designated "K-12". 1 The strain Escherichia coli K-12 was isolated in 1922 from the stool of a convalescent diphtheria patient in a clinical bacteriology lab in Palo Alto, California. In 1925, the culture was deposited in the strain collection of the Department of Bacteriology at Stanford University, where it was given the designation K-12. Strain K-12 gave typical results in the standard tests used for the identification of E. coli and was therefore used for many years in bacteriology lab classes as a typical example of E. coli. This original culture is still maintained in the department collection. In the early 1940s, E. L. Tatum, then at Stanford, asked the bacteriology department for some bacteria to test for possible use in his studies of biochemical genetics. By great good luck he was given, along with cultures of other species of bacteria, E. coli K-12 which proved to be ideally suited to his studies because it is prototrophic, easy to cultivate in a defined medium, and grows rapidly. The use of this bacterium permitted easy study of very large populations and thus the accurate analysis of very rare events, such as spontaneous mutations, presenting a great advantage in this respect over the plants, animals, and fungi previously used in genetic studies. In 1944, Tatum and Gray reported the isolation of the first auxotrophic mutants of strain K-12. In 1946, Lederberg and Tatum demonstrated genetic recombination in strain K-12, further opening the door to genetic research. Since that time, many thousands of derivatives of strain K-12 have been created in laboratories around the world. All the strains used in this course are in the K-12 family. Page 6 of 8
Another interesting property of K-12 lab strains is that they have lost the ability, during many years of laboratory cultivation, to colonize the human GI tract. This makes them potentially safer to use than "wild" E. coli strains. 2 The original lab strain of E. coli K-12 isolated in 1922, came from nature with a pan genome containing a large conjugal plasmid (the F plasmid) and a lysogenic bacteriophage genome (Lambda). These elements are not found uniformly in all strains of E. coli in nature, and they have been lost even in many descendants of K-12. The first E.coli strain that was subject to whole genome sequencing is a derivative of K-12 designated MG1655. MG 1655 differs from K-12 by the removal of the F plasmid and the Lambda genome, as is the closest thing we have to a "generic" E. coli. from Chart 8. in Bachmann (1996) Page 7 of 8
REFERENCES (Available online through links on the course website References page.) 1 Bachmann, B. J. (1996) Derivations and Genotypes of Some Mutant Derivatives of Escherichia coli K-12 Chapter 133 in Escherichia coli and Salmonella: Cellular and Molecular Biology 2nd ed. Vol. 2 This is a description of the E. coli K12 family tree. It would be a waste of time to slog through the whole paper at this point. However, you might read just the little information it provides on the derivation of strain MG1655 from the original wild-type. 2 3 Smith, H.K (1975) Survival of orally administered E. coli K12 in alimentary tract of human. Nature 255:500-502. Blattner, et al. (1997) The Complete Genome Sequence of Escherichia coli K-12 Science 277: 1453. Page 8 of 8