BACTERIAL GENOMICS Reading in BOM-12: Sec. 11.1 Genetic Map of the E. coli Chromosome p. 279 Sec. 13.2 Prokaryotic Genomes: Sizes and ORF Contents p. 344 Sec. 13.3 Prokaryotic Genomes: Bioinformatic Analysis and Gene Distributions p. 347 Sec. 13.12 Evolution of Virulence: Pathogenicity Islands p. 362 Review Questions Chap. 11: #1; Chap. 13: #1, 2, 3. 11.1 Genetic Map of the E. coli Chromosome We still consider the E. coli genome as a fairly typical Bacterial genome, and given the extensive information available about this organism and it's lifestyle, the E. coli genome is a useful point of departure for subsequent discussion of prokaryotic genome diversity. 1 of 5
Physical Form of the Genome vs. the Genetic Map: The circularity of the E. coli genetic map [Fig. 10.42] reflects the physical form of the DNA genome. All of the genes in the "core" genome of E. coli are coded in a single double-stranded DNA molecule that has no ends. (Contrast this with the linear genetic maps typical of eukaryotic chromosomes.) Although we sometimes refer to the DNA as being "circular", this distorts physical reality. If the genome were laid out in a perfect circle, the circumference would be about 1 mm, at least several hundred times the dimensions of an E. coli cell. Clearly the DNA is not in a true circle while inside a cell, but highly folded and compacted. So, while the sophisticated E. coli geneticist refers to the genetic map as circular, they describe the genomic DNA as being "covalently continuous". Core Genome vs. Pan Genome The variable presence of lysogenic viruses, plasmids, transposable elements, and such lends itself the concept that each individual strain of a bacterial species shares a common or "core" array of genes with all other strains of that species. The core genome includes most or all "housekeeping genes"; genes essential for basic cell functions such as replication, transcription and translation. However, each strain will have a unique and variable set of adjunct genetic sequences, either in the bacterial chromosome or in extrachromosomal elements such as plasmids, that is called the "pan" genome. The pan genome often includes (inessential) genes that allow the cell to adapt to special circumstances. The text points out that the original lab strain of E. coli K-12 isolated in 1922, came from nature with a pan genome containing a large conjugal plasmid (the F plasmid) and a lysogenic bacteriophage genome (Lambda). These elements are not found uniformly in all strains of E. coli. Vital Statistics Derived from Complete Genome Sequencing (for E. coli MG1655) Classic genetic map is arbitrarily divided into 100 "minutes", corresponding to 4,639 kbp of DNA containing 4,288 ORF's. The average size of an ORF is close to 1 kbp. Therefore, approximately 90% of the DNA in the genome is coding sequence. 2 of 5
Gene Copy Number Most genes in bacterial genomes occur in a single copy. rrna genes are a notable exception. rrna genes occur in copy numbers of 1-15 in sequenced genomes. Hiogher copy numbers facilitate higher level expression of the gene products. Functional Distribution of E. coli ORF's FUNCTIONAL CATEGORY # Proteins % DNA Replication, Recombination, Repair 115 2.7 Regulatory Proteins 133 3.1 Enzymes for Cell Structural Components 182 4.2 Translation 182 4.2 Physiological Responses to Environment 188 4.4 Energy Metabolism 243 5.7 Other Enzymes of Intermediary Metabolism 318 7.4 Biosynthetic Enzymes 340 7.9 Transport Proteins 427 10.0 Other known 528 12.3 Unknown 1,632 38.1 TOTAL 4,288 100.0 Operons and Regulons - See text p. 231. An operon is a set of coordinately expressed genes transcribed from a single promoter (on a single polycistronic mrna). A regulon is a set of coordinately expressed genes (or operons) that are transcribed from separate promoters (on separate mrna s). DNA Base Methylation Methylated bases (5-methyl cytosine, etc.) are relatively common in the E. coli genome. At least some base methylation is related to the presence of Restriction/Modification Systems and of Methyl-Directed Mismatch Repair. Bidirectional replication from fixed origin (oric at 84') and terminus opposite. Dichotomous Replication Dichotomous replication means that new cycles of DNA replication begin before the previous cycle is complete. This allows the minimum cell doubling time (15 min.) to be less than the minimum time required to replicate the genome (60 min.). 3 of 5
At rapid growth rates (doubling time less than 30 min.) dichotomous replication means that the copy number of genes located near the origin of replication is enriched relative to genes near the terminus. Highly expressed genes are preferentially located near the origin of replication to increase the relative copy number. Transcription-Translation Coupling The lack of a nuclear compartment in prokaryotes leads to one of the most important molecular differences between prokaryotes and eukaryotes, namely, in prokaryotes transcription and translation of protein coding genes is coupled (simultaneous). In other words, ribosomes are translating the 5' end of a mrna before the 3' end of the mrna has been transcribed. Coupling of transcription to translation reduces opportunity for RNA processing (intron splicing). Introns are exceedingly rare in the bacterial genomes sequenced so far. There are no introns in in the E. coli genome. Chromosomal Islands, Pathogenicity Islands and Lateral Gene Flow Chromosomal Islands are contiguous clusters of genes that have evidently been acquired from another organism by lateral (horizontal) gene transfer. The genes in the island typically confer an adjunct property on the cell such as symbiosis or virulence, in which case the chromosomal island is referred to as a pathogenicity island. Chromosomal islands would be considered part of the Pan genome of a species. Characteristics that allow recognition of a chromosomal island include: i.) the genes are not present in closely related strains or groups ii.) the %GC of the island is different than the genome as a whole. iii.) the pattern of synonomous codon preference is different from the majority of genes in the genome. iv.) the DNA sequences at the island boundaries are similar to sequences found in lysogenic virus genomes or in transposable elements. 4 of 5
Sec. 13.2 Prokaryotic Genomes: Sizes and ORF Contents Discussion Questions 1. Why are highly expressed genes in E. coli located nearer the origin of replication than genes expressed at lower levels? 2. In E. coli the genes for the ribosomal RNAs (rrnas) are organized into 7 nearly identical operons. In other words, the genes for the different forms of rrna (5S, 16S and 23S) are adjacent to each other and copied into a single transcript that is later processed. There are 7 nearly identical copies of the RNA operon. The location and direction of transcription of the 7 operons is shown in the diagram. What do these facts suggest about expression of the rrna genes? 3. There are 86 trna genes in the genome of E. coli. Overall, the direction of transcription of these genes is approximately 50% clockwise, and 50% counterclockwise (with respect to the genetic map). However, all the genes transcribed in a clockwise direction are in one half of the genome, and the genes transcribed counterclockwise are in the other half. What is the basis of this pattern of transcriptional orientation? 5 of 5