TANDEM genetic duplications are probably the

Similar documents
The steps of the hypothesis test

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

Math 1B, lecture 4: Error bounds for numerical methods

Tests for the Ratio of Two Poisson Rates

Acceptance Sampling by Attributes

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

CBE 291b - Computation And Optimization For Engineers

APPROXIMATE INTEGRATION

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

Physics 201 Lab 3: Measurement of Earth s local gravitational field I Data Acquisition and Preliminary Analysis Dr. Timothy C. Black Summer I, 2018

4.4 Areas, Integrals and Antiderivatives

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

Non-Linear & Logistic Regression

CS667 Lecture 6: Monte Carlo Integration 02/10/05

Predict Global Earth Temperature using Linier Regression

Math 8 Winter 2015 Applications of Integration

5.7 Improper Integrals

13: Diffusion in 2 Energy Groups

New data structures to reduce data size and search time

Student Activity 3: Single Factor ANOVA

DIRECT CURRENT CIRCUITS

Section 5.1 #7, 10, 16, 21, 25; Section 5.2 #8, 9, 15, 20, 27, 30; Section 5.3 #4, 6, 9, 13, 16, 28, 31; Section 5.4 #7, 18, 21, 23, 25, 29, 40

Vyacheslav Telnin. Search for New Numbers.

Scientific notation is a way of expressing really big numbers or really small numbers.

Fig. 1. Open-Loop and Closed-Loop Systems with Plant Variations

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

p-adic Egyptian Fractions

A Brief Review on Akkar, Sandikkaya and Bommer (ASB13) GMPE

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Diverse modes of eco-evolutionary dynamics in communities of antibiotic-producing microorganisms

Math 124A October 04, 2011

ARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as

Entropy ISSN

Math& 152 Section Integration by Parts

LECTURE 14. Dr. Teresa D. Golden University of North Texas Department of Chemistry

Numerical integration

1 Online Learning and Regret Minimization

Precalculus Spring 2017

Chapter 0. What is the Lebesgue integral about?

Terminal Velocity and Raindrop Growth

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

7.2 The Definite Integral

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

The use of a so called graphing calculator or programmable calculator is not permitted. Simple scientific calculators are allowed.

Monte Carlo method in solving numerical integration and differential equation

2008 Mathematical Methods (CAS) GA 3: Examination 2

a < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1

Fundamentals of Analytical Chemistry

CHEMICAL KINETICS

A5682: Introduction to Cosmology Course Notes. 4. Cosmic Dynamics: The Friedmann Equation. = GM s

ADVANCEMENT OF THE CLOSELY COUPLED PROBES POTENTIAL DROP TECHNIQUE FOR NDE OF SURFACE CRACKS

Midterm#1 comments. Overview- chapter 6. Recombination. Recombination 1 st sense

Exam 1 Solutions (1) C, D, A, B (2) C, A, D, B (3) C, B, D, A (4) A, C, D, B (5) D, C, A, B

The Atwood Machine OBJECTIVE INTRODUCTION APPARATUS THEORY

Review of Calculus, cont d

Lecture 20: Numerical Integration III

Numerical Integration

STAT 536: Genetic Drift

USA Mathematical Talent Search Round 1 Solutions Year 21 Academic Year

Heat flux and total heat

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

Math 426: Probability Final Exam Practice

SESSION 2 Exponential and Logarithmic Functions. Math 30-1 R 3. (Revisit, Review and Revive)

Math 520 Final Exam Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Applications of Bernoulli s theorem. Lecture - 7

MAC-solutions of the nonexistent solutions of mathematical physics

Lecture INF4350 October 12008

Module 6: LINEAR TRANSFORMATIONS

Conservation Law. Chapter Goal. 5.2 Theory

Shear and torsion interaction of hollow core slabs

Electron Correlation Methods

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

pivot F 2 F 3 F 1 AP Physics 1 Practice Exam #3 (2/11/16)

UNIVERSITY OF MALTA DEPARTMENT OF CHEMISTRY. CH237 - Chemical Thermodynamics and Kinetics. Tutorial Sheet VIII

Recitation 3: More Applications of the Derivative

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams

1B40 Practical Skills

MATHS NOTES. SUBJECT: Maths LEVEL: Higher TEACHER: Aidan Roantree. The Institute of Education Topics Covered: Powers and Logs

MAA 4212 Improper Integrals

8 Laplace s Method and Local Limit Theorems

Population bottleneck : dramatic reduction of population size followed by rapid expansion,

Ch. 24 Molecular Reaction Dynamics 1. Collision Theory 2. Diffusion-Controlled Reaction

The Predom module. Predom calculates and plots isothermal 1-, 2- and 3-metal predominance area diagrams. Predom accesses only compound databases.

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

Question 1: Figure 1: Schematic

Chapter 5 : Continuous Random Variables

SOLUTIONS FOR ADMISSIONS TEST IN MATHEMATICS, COMPUTER SCIENCE AND JOINT SCHOOLS WEDNESDAY 5 NOVEMBER 2014

Experiment 9: DETERMINATION OF WEAK ACID IONIZATION CONSTANT & PROPERTIES OF A BUFFERED SOLUTION

Math 360: A primitive integral and elementary functions

different methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s).

Continuous Random Variables

Spanning tree congestion of some product graphs

Psychrometric Applications

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

ECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance

13.4 Work done by Constant Forces

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Transcription:

Copyright Ó 2010 by the Genetics Society of Americ DOI: 10.1534/genetics.109.111963 Dupliction Frequency in Popultion of Slmonell enteric Rpidly Approches Stedy Stte With or Without Recombintion Andrew B. Rems,*,1 Eric Kofoid,* Michel Svgeu nd John R. Roth* *Deprtment of Microbiology, College of Biologicl Sciences nd Deprtment of Biomedicl Engineering, University of Cliforni, Dvis, Cliforni 95616 Mnuscript received November 11, 2009 Accepted for publiction Jnury 9, 2010 ABSTRACT Tndem duplictions re mong the most common muttion events. The high loss rte of dupliction suggested tht the frequency of duplictions in bcteril popultion (1/1000) might reflect stedy stte dictted by reltive rtes of formtion (k F ) nd loss (k L ). This possibility ws tested for three genetic loci. Without homologous recombintion (RecA), dupliction loss rte dropped essentilly to zero, but formtion rte decresed only slightly nd stedy stte ws still reched rpidly. Under ll conditions, stedy stte ws reched fster thn predicted by formtion nd loss rtes lone. A mjor fctor in determining stedy stte proved to be the fitness cost, which cn exceed 40% for some genomic regions. Depending on the region tested, duplictions reched 40 98% of the stedy-stte frequency within 30 genertions pproximtely the growth required for single cell to produce sturted overnight culture or form lrge colony on solid medium (10 9 cells). Long-term bcteril popultions re stbly polymorphic for duplictions of every region of their genome. These polymorphisms contribute to rpid genetic dpttion by providing frequent preexisting muttions tht re beneficil whenever imposed selection fvors increses in some gene ctivity. While the reported results were obtined with the bcterium Slmonell enteric, the genetic implictions seem likely to be of broder biologicl relevnce. TANDEM genetic duplictions re probbly the most common muttion type in terms of their rte of formtion nd their frequency in n overnight culture. Roughly 10% of cells in n unselected lbortory culture of Slmonell enteric crry dupliction of some chromosoml region nd 0.005 3% hve dupliction of specified gene (Anderson nd Roth 1977). The sitution my be even more extreme in humns, whose genomes contin hundreds of copy number vritions (CNVs) (Shrp et l. 2005; Korbel et l. 2007; Kidd et l. 2008). The phenotypes cused by duplictions cn be detected by selection nd contribute to fitness whenever growth is limited by quntity or ctivity of prticulr protein (Sonti nd Roth 1989; Tlsty et l. 1989; Andersson et l. 1998). Selected increses in gene copy number cn enhnce the likelihood of point muttions tht further increse fitness by providing more trgets for chnge (Roth et l. 2006; Sndegren nd Andersson 2009; Sun et l. 2009). Little is known bout dupliction formtion or why duplictions re so frequent in unselected popultions. Duplictions form frequently between seprted sequence repets, suggesting role for homologous recombintion (Figure 1, top). For exmple, the most frequently duplicted regions of the chromosome of S. enteric re those between copies of the rrn cistrons, 1 Corresponding uthor: University of Cliforni, 316 Briggs Hll, One Shields Ave., Dvis, CA 95616. E-mil: brems@ucdvis.edu with 6.5 kb of nerly identicl sequence (Anderson nd Roth 1981). In contrst, less common duplictions rise between regions with little or no sequence homology, whose formtion seems unlikely to require recombintion (Kugelberg et l. 2006). The role of recombintion in dupliction formtion hs been difficult to ssess becuse previous dupliction ssys depended on recombintion proficiency. New ssys described here suggest tht duplictions cn form without homologous recombintion even when extensive sequence repets serve s junction points. Dupliction loss occurs t 1% per genertion nd is essentilly eliminted in reca mutnt strins (Anderson nd Roth 1981), which lck ctlyst of strnd invsion tht is essentil to homologous recombintion in otherwise norml strins. Exchnges like those leding to dupliction loss cn lso further mplify gene copy number (Figure 1, bottom). This study ws initited to test the possibility tht dupliction frequency in popultion might rech stedy stte dictted by the reltive rtes of formtion (k F ) nd loss (k L ), s digrmmed in Figure 2 (top). Initilly, the fitness cost (growth deficit) of duplictions ws ssumed to be smll nd ws not considered. If rtes of formtion nd loss dictte the stedy-stte dupliction frequency (nd fitness cost is negligible), stedy stte is expected when formtion nd loss re equivlent. This extreme sitution is digrmmed t the top of Figure 2. Genetics 184: 1077 1094 (April 2010)

1078 A. B. Rems et l. Figure 1. Formtion nd loss of duplictions. Duplictions re thought to rise by exchnges between seprted elements on sister chromosomes. These elements vry in size from severl bse pirs to multiple kilobses. Once dupliction is in plce, the extensive sequence repets re subject to unequl recombintion events between sister chromosomes tht cn led to loss of the dupliction (reversion) or to further increses in copy number (mplifiction). Both loss nd further mplifiction re expected to occur t the sme rte (k L ). The contribution of fitness cost to stedy sttes cn be seen if one elimintes dupliction loss by setting k L close to zero (s in reca mutnt). Under these extreme conditions (Figure 2, bottom), the dupliction frequency cn come to stedy stte if the duplictionbering cell grows more slowly thn the prent hploid. At this stedy stte, the formtion of new duplictions just compenstes for decreses in dupliction frequency due to the growth deficit. In discussing these predictions, one must relize tht independent duplictions of prticulr locus my differ in endpoints nd mounts of included flnking mteril (see Figure 3). The mesured dupliction formtion rte for some locus is thus the sum of these seprte rtes. In contrst, loss rte is mesured for n isolted mutnt with one prticulr dupliction type. The results described here suggest tht for most regions, the vriety of dupliction types is smll enough tht one cn use the ggregte formtion rtes nd verge loss rte in predicting the pproch of dupliction frequency to stedy stte. The vribility in size nd fitness cost of duplictions predicts tht during prolonged growth, the new duplictions dded to the popultion will represent the full gmut of types nd those with the gretest fitness cost will be lost preferentilly. Thus with time, even ner stedy stte, the frequency of lower-cost duplictions will increse t the expense of those cusing slower growth rte. If smll duplictions hve lower cost, the verge dupliction size should drop with time. Results presented here suggest unexpected fetures of duplictions. Homologous recombintion contributes wekly to dupliction formtion but is lmost essentil for loss. Duplictions cn hve high fitness Figure 2. Conditions mintining stedy-stte dupliction frequency. (Top) Every chromosoml region is subject to dupliction tht converts hploid cell (H) to one with dupliction (D). The concentrtions of H nd D cells increse with growth rte constnts m H nd m D. Hploid cells give rise to duplictions with rte constnt k F nd diploids lose their dupliction with the rte constnt k L. (Middle) When growth rtes of hploid nd diploid cells re equl, dupliction frequency is dictted by rtes of formtion nd loss, since k F > k L. (Bottom) When duplictions cnnot be lost by reversion (k L ¼ 0), stedy stte cn be reched if dupliction strins grow less rpidly thn hploids nd formtion rte blnces growth deficit. cost tht contributes hevily (with formtion nd loss rtes) to rpid estblishment of stedy-stte frequencies. The growth required for single cell to produce sturted 1-ml liquid culture or lrge colony on solid medium (10 9 cells) llows dupliction frequency to pproch stedy stte. The stedy-stte dupliction frequency (typiclly 0.1% for ny locus) cn be considered stble polymorphism tht provides frequent copy number vrints. These vrints cn contribute to rpid genetic dpttion whenever selection conditions fvor incresed gene ctivity. We expect tht the generl conclusions regrding stedy-stte dupliction frequencies will pply to copy number vrition in ny orgnism. A proposl for rpid evolution of new genes by mens of selective gene mplifiction hs been described previously (Bergthorsson et l. 2007). MATERIALS AND METHODS Strins nd medi: All strins were derivtives of S. enteric (Typhimurium) strin LT2. Primry strins re listed in Tble 1. Rich medium ws Luri broth (LB) with ntibiotics s described below. The chromogenic b-glctosidse substrte 5-bromo-4-chloro-3-indolyl-b-d-glctopyrnoside (X-gl) ws obtined from Dignostic Chemicls, Oxford, CT nd used in pltes t 40 mg/ml. Determining increses in dupliction frequency during extended growth periods: Three dupliction ssy methods re described in ppendix. The K-Kn ssy ws used for reported experiments, but its results were confirmed by both

Dupliction Polymorphism 1079 Figure 3. Mesured events in dupliction formtion nd loss. The rte of dupliction formtion describes vriety of different events tht provide two copies of the ssyed locus. The rte of dupliction loss is ssyed for prticulr dupliction, which my or my not be typicl of the whole collection. the T-Recs nd the drug-in-drug ssys. For the K-Kn ssy, ll strins crried lcz 1 llele t the test locus (e.g., inserts in the chromosoml rgh nd pyrd genes nd the stndrd lcz locus on F9 128 ). This lcz gene ws expressed constitutively due to n insertion in the repressor gene (lci) of promoterless knmycin resistnce determinnt (K). All strins crried plsmid (psim5) with repressed genes (Red) for lmbd recombintion (Court et l. 2002). The seven strins used in these ssys re listed in Tble 1 (TT25467, -25468, -25469, -25996, -25997, -26002, nd -26003) nd the mteril inserted t the test locus is digrmmed in ppendix. To determine the dupliction frequency fter 33 genertions, 3-ml culture in LB ws grown overnight t 30 to mintin repression of the Red recombintion functions. One liquot of sturted culture ws used to ssy cell number nd nother to determine dupliction frequency by trnsformtion. Trnsformtions were done fter 15 min t 42 to induce Red genes. Recombinnts were selected on LB knmycin X-gl pltes. Ech trnsformnt tht cquired Kn R simultneously lost the djcent lcz 1 gene. White (Lc ) trnsformnt colonies indicted hploid recipient cell; blue (Lc 1 ) colonies indicted recipient with dupliction of the test locus. A third liquot from the overnight culture ws used to inoculte continuous growth chmber contining 40 ml of fresh LB medium. A continuous growth chmber (turbidostt) ws used to follow ccumultion of duplictions over extended time periods. Rich LB medium with chlormphenicol (to mintin the Red plsmid) ws pumped continuously into 40-ml culture vessel (nd culture llowed to flow out) to mintin constnt volume nd cell popultion size (i.e., cell loss by dilution blnced gin by growth.) Cultures grew t 30 with stirring t 200 rpm nd continuous flushing with hydrted ir. The medium flow rte ws djusted to mintin midlog phse (OD 650 0.2), well below sturtion level (OD 650 ¼ 1.1). The number of cell genertions ws clculted from the flow rte (i.e., one genertion per 40 ml removed). Wild-type cells grew 60 genertions per dy (24 min/genertion). At vrious times, smples were removed nd ssyed for dupliction frequency. Fitness cost mesurements: All strins crried reca muttion to prevent dupliction loss nd plsmid (psim5) encoding phge lmbd recombintion functions (Red), which were repressed during growth nd induced during the finl dupliction ssy. It ws ssumed tht, while reca muttion reduces ll growth rtes slightly, it did not ffect the reltive fitness of dupliction nd hploid strins. Control experiments showed tht reca effectively prevented dupliction loss. Over the entire growth period,,3% of cells lost the lc dupliction on F9 128 nd,0.001% of cells lost chromosoml duplictions. The hploid control strin crried chromosoml Kn R insertion to ensure tht differences in growth rte were not due to the Kn R. Overnight cultures were diluted 800-fold into 6 wells of 96-well plte contining LB with 10 mg/ml chlormphenicol (to mintin the repressed Red plsmid) nd growth rtes were mesured using Synergy HT plte reder (Bio-Tek). The culture plte ws incubted t 30 with continuous shking nd bsorbnce t 650 nm ws red t 15-min intervls. For ech strin, the growth rte mesurement ws repeted for minimum of three independent clones. The stndrd devition of the reltive rtes ws,0.02. The smll fitness costs of lc duplictions on plsmid F9 128 were lso determined by the more sensitive method of direct growth competition between ech of four dupliction strins nd the hploid prent strin (Figure 4). Strins to be compred were mde reca to minimize dupliction loss during growth nd ech ws geneticlly tgged with either tetrcycline or spectinomycin resistnce cssette inserted into the chromosoml proab genes. Results were the sme with reversed tgs. Five 3-ml tubes of LB medium contining 10 mg/ml chlormphenicol were inoculted with 10 6 cells of ech of the two strins to be competed. After n overnight growth cycle, the culture ws diluted nd the process repeted for multiple pssges. After ech cycle, smples were plted for single colonies on LB pltes, which were then printed to tetrcycline nd spectinomycin plte, to determine the rtio of the two cell types. The reltive fitness ws clculted using spredsheet simultion in which virtul popultions were strted t the input rtios used in the ctul experiment. In the simultion, hploids nd diploids were llowed to double t distinct rtes (m D nd m H ). The dupliction formtion rte (k F ) ws set to zero becuse reca dupliction strins produce essentilly no hploid cells from which new duplictions could form. In the experiment, new duplictions were lso ignored in tht the hploid cells were counted without regrd to ny new copy number vrints, which would be rre nd growth impired. The dupliction loss rte (k L ) ws essentilly zero due to reca, but ws set to the low rte mesured for the specific duplictions being nlyzed. To determine reltive fitness, the reltive growth rte of hploid nd diploid cells (m D /m H ) ws vried in the simultion, until vlue ws found tht predicted the rtio of cell types determined in the experiment. The rtio of growth rtes or reltive fitness (m D /m H ) llowed estimtion of the dupliction growth rte (m D ). Determining dupliction loss rte (k L ): Assyed cells crried dupliction trpped using the K-Kn method nd thus hd (Kn R )(Lc 1 ) phenotype. The loss rte of the Kn R or Lc 1 phenotype ws mesured in single colonies growing t 30 on nonselective pltes of solid LB medium contining X-gl. Ech colony served s n individul culture in which the observed frequency of cells losing the dupliction ws converted to loss rte by corrections for the different growth rtes of hploids nd dupliction-bering strins. Five single colonies were nlyzed for ech trpped dupliction strin. Smll colonies ppering overnight (23 genertions) were plugged using the wide end of sterilized Psteur pipette nd cells were suspended in 1 ml of minimum medium (NCE) in 1.5-ml Eppendorf tube. Cells were serilly diluted in NCE medium nd plted on LB X-gl to determine totl cell number nd the frction tht ws Lc 1. The pltes were replic printed to LB knmycin X-gl pltes to determine the number of dupliction-bering cells Kn R Lc 1 nd the frction of cells tht retined Kn R, but hd lost their Lc 1 copy. The frequency of dupliction loss ws the sum of

1080 A. B. Rems et l. TABLE 1 Strins used in this study Strin Genotype Source or reference TR10000 Wild-type S. enteric (serovr Typhimurium) LT2 Lb collection TT22240 reca643ttn10dtc Willims et l. (2006) TT25467 his-644(dogdcbhaf) pro-621/f9 128 pro 1 lci4650tk(sw, Kn S ), lczy 1 /psim5(cm R ) This report TT25468 his-644(dogdcbhaf) pro-621 reca650tgnt(sw)/f9 128 pro 1 lci4650tk(sw, Kn S ), This report lczy 1 /psim5(cm R ) TT25469 his-644(dogdcbhaf) pro-621 reca651trif(sw)/f9 128 pro 1 lci4650tk(sw, Kn S ), This report lczy 1 /psim5(cm R ) TT25478 pyrd2266ttn10(tc S Kn)/pSIM5(Cm R ) This report TT25479 pyrd2266ttn10(tc S Kn) reca650tgntr(sw)/psim5(cm R ) This report TT25486 rgh1823ttn10(tc S Kn)/pSIM5(Cm R ) This report TT25487 rgh1823ttn10(tc S Kn) reca650tgnt(sw)/psim5(cm R ) This report TT25706 rgh1823::tn10(t-recs) reca650tgnt(sw) This report TT25707 pyrd2266:tn10(t-recs) reca650tgnt(sw) This report TT25791 Wild-type LT2/F9 128 lci4650tk(sw, Kn S ), lczy 1 This report TT25792 reca651trif(sw)/f9 128 lci4650tk(sw, Kn S ), lczy 1 This report TT25794 leud21 prob1657ttn10 reca650tgnt(sw)/f9 128 pro 1 IS3CTRif(sw) DlcIZ4652T This report Kn R (sw) lca4653t(cm R,O c reca 1 ) TT25996 rgh1947tlc[lci4650tk(sw,kn S ), lczy 1 ]/psim5(cm R ) This report TT25997 rgh1947tlc[lci4650tk(sw,kn S ), lczy 1 ] reca650tgnt(sw)/psim5 (Cm R ) This report TT26002 pyrd2827tlc[lci4650tk(sw,kn S ), lczy 1 ]/psim5 (Cm R ) This report TT26003 pyrd2827tlc[lci4650tk(sw,kn S ), lczy 1 ] reca650tgnt(sw)/psim5 (Cm R ) This report TT26046 rgh1947tlc[lci4650tk(sw,kn S ), lczy 1 ] pyrd2826tkn reca650tgnt(sw)/ This report psim5 (Cm R ) TT26047 rgh1947tlc[lci4650tk(sw,kn S ), lczy 1 ] purh2378tkn reca650tgnt(sw)/ This report psim5 (Cm R ) TT26049 pyrd2827tlc[lci4650tk(sw,kn S ), lczy 1 ] purh2378tkn reca650tgnt(sw)/ This report psim5 (Cm R ) TT26050 pyrd2827tlc[lci4650tk(sw,kn S ), lczy 1 ] hisc10309tkn reca650tgnt(sw)/ This report psim5 (Cm R ) TT26056 his-644(dogdcbhaf) proab670tsw-spc R reca650tgnt(sw)/f 128 pro1 lci4650t This report K(sw,Kn S ), lczy 1 ]IS3ATKn(sw)/pSIM5(Cm R ) TT26057 his-644(dogdcbhaf) proab1657(tcr) reca650tgnt(sw)/f9 128 pro1 lci4650t This report K(sw,Kn S ), lczy 1 ]IS3ATKn(sw)/pSIM5(Cm R ) TT26059 his-644(dogdcbhaf) proab670(spr) reca650tgnt(sw)/f9 128 pro1 DUP2066 This report [lci4650tk(sw,kn S ), lczy 1 ] IS3ACjoin [DlcIZ4652TKn(sw)]/pSIM5(Cm R ) TT26061 his-644(dogdcbhaf) proab1657(tcr) reca650tgnt(sw)/f9 128 pro1 DUP2066 This report [lci4650tk(sw,kn S ), lczy 1 ] IS3ACjoin [DlcIZ4652TKn(sw)]/pSIM5(Cm R ) TT26065 his-644(dogdcbhaf) proab1657(tcr) reca650tgnt(sw) /F 128 pro1 DUP2067 This report [lci4650tk(sw,kn S ), lczy 1 ] REP25/REP32/17 join [DlcIZ4652TKn(sw)]/ psim5(cm R ) TT26066 his-644(dogdcbhaf) proab670(spr) reca650tgnt(sw)/f9 128 pro1 DUP2067 This report [lci4650tk(sw,kn S ), lczy 1 ] REP25/REP32/17 join [DlcIZ4652TKn(sw)]/ psim5(cm R ) TT26069 his-644(dogdcbhaf) proab1657(tcr) reca650tgnt(sw)/f9 128 pro1 DUP2068[lcI This report 4650TK(sw,Kn S ), lczy 1 ] REP26/REP32/17 join [DlcIZ4652TKn(sw)]/ psim5(cm R ) TT26070 his-644(dogdcbhaf) proab670(spr) reca650tgnt(sw)/f9 128 pro1 DUP2068[lcI This report 4650TK(sw,Kn S ), lczy 1 ] REP26/REP32/17 join [DlcIZ4652TKn(sw)]/ psim5(cm R ) TT26073 his-644(dogdcbhaf) proab1657(tcr) reca650tgnt(sw)/f9 128 pro1 DUP2069[lcI 4650TK(sw,Kn S ), lczy 1 ] IS3BCjoin [DlcIZ4652TKn(sw)]/pSIM5(Cm R ) This report sw designtes lleles in which the norml sequence ws replced by drug resistnce determinnt. T-Recs designtes derivtive of Tn10 with Cm R determinnt nd constitutive (O c ) reca 1 llele inserted into the middle of the Tn10 teta gene, rendering the strin sensitive to tetrcycline nd resistnt to chlormphenicol. lci4650tk(sw, Kn S ) refers to lci gene replced by promoterless knmycin resistnce determinnt. Tc S Kn designtes Tn10dTc element in which Kn R determinnt is inserted in the teta gene, rendering the strin sensitive to tetrcycline nd resistnt to knmycin. The psim5(cm R ) plsmid crries the recombintion genes (red) of phge lmbd.

Dupliction Polymorphism 1081 frequencies of cells tht hd lost either the Kn R or the Lc 1 phenotype. The totl genertions of nonselective growth were clculted from the totl cell number in the colony. To determine the loss rte constnt, k L, spredsheet simultion ws run in which virtul single duplictionbering cell grew nd produced new hploid segregnts. The simultion used previously mesured growth rtes of prent dupliction cells nd derived hploids (m D nd m H ). The spredsheet keeps trck of the entire popultion of hploid nd dupliction cells. Hploids were ssumed never to rech frequency sufficient to contribute significntly to formtion of new diploids. The vlue of k L in this spredsheet ws vried until the simultion predicted the frequency of hploid cells tht ws observed experimentlly (fter number of genertions equl to tht mesured for the experimentl colony). Both the experiment nd the spredsheet simultion ignored formtion of new diploids, ssuming tht the smll size of the hploid segregnt popultion nd the low dupliction rte mde this negligible. In this experimentl ssy, ny new dupliction rising from hploid segregnts would retin the hploid phenotype, either Kn R Lc or Kn S Lc 1, nd would not be scored s dupliction (Kn R )(Lc 1 ). Jckpots were not n issue in the simultion since frctionl numbers of hploids were llowed. In the colonies, jckpots were miniml becuse segregtion rtes were so high. The medin k L vlue mong five colonies ws tken s the k L vlue for tht strin; these vlues showed very little vrince. Predicting k F nd the stedy-stte frequency on the bsis of frequency fter 33 genertions: The k F vlue ws estimted from the rte t which duplictions ccumulte in culture grown from single hploid cell. This vlue ws the most difficult to determine becuse newly formed dupliction cells grew more slowly thn the prent nd were rpidly lost by segregtion (k L is typiclly 1000-fold higher thn k F ). In ddition, our methods for determining dupliction frequency required 10 9 cells, which necessittes 30 genertions of growth before the first mesurement could be mde. The vlue of k F ws estimted using simultion in which virtul colony ws grown from single hploid cell dividing t mesured rte m H nd producing diploids tht grew t mesured rte m D. Duplictions formed nd disppered t rtes k F (unknown) nd k L (described bove). At ech hploid genertion, the number of hploid nd dupliction-bering cells ws clculted. The vlue k F ws vried until the frequency of duplictions predicted for hploid genertion 33 of the simultion equled tht observed t the corresponding point in the ctul culture. The vlue of k F determined in this ws used s one estimte for the strin being tested. Essentilly, the simultion used the mesured vlues of k L, m H, nd m D with the mesured dupliction frequency (D/H) t genertion 33 to estimte k F nd the expected stedy-stte frequency for the prticulr dupliction. The simultion voided Luri Delbruck fluctutions by using frctionl cell numbers. The solid lines in Figure 5 indicte the trjectory of dupliction frequency bsed on this simultion. The trjectory ws set to gree with the mesured frequency t genertion 33. The vlues obtined using the spredsheet simultion re verified quntittively below. RESULTS Genomic regions ssyed: Duplictions described here ffected three sites: two in the chromosome nd one on plsmid F9 128 (see Figure 4). The rgh gene is locted between direct repets of the 6.5-kb rrn repets nd is mong the most frequently duplicted sites in the chromosome (.1% in unselected cultures) (Anderson nd Roth 1981). The pyrd gene is fr wy from ny rrn locus nd hs low dupliction frequency of 0.005% in n overnight culture. The lc operon of Escherichi coli on plsmid F9 128 ws chosen becuse mplifiction of this region plys mjor role in reversion of lc under selection in the Cirns system (Cirns nd Foster 1991; Hendrickson et l. 2002; Slecht et l. 2003; Kugelberg et l. 2006). Like the rgh gene, the plsmid lc region is flnked by lrge repets (IS3; 1258 bp) nd by clusters of shorter (30-bp) Rep elements (Bchellier et l. 1999; Kofoid et l. 2003). The lc locus is duplicted in 0.2% of cells in n unselected culture. Dupliction frequency pproched stedy stte in both reca 1 nd reca strins: During prolonged growth of reca 1 strins, the frequency of duplictions reched stedy-stte level for ll of the three regions tested s seen in Figure 5 (see dt points for reca 1 strins). The dt for reca mutnt strins is described lter. The solid lines describe the dupliction ccumultion predicted from simultion tht tkes into ccount fitness costs nd mesured rtes of dupliction formtion nd loss. The close fit between experimentl dt nd predicted ccumultion should be noted; the simultion is described lter. All three loci were ssyed in unselected cultures grown for extended periods in rich medium with continuous dilution to mintin constnt cell density (turbidostt). The severl ssy methods used re described in ppendix. The dupliction frequency dt presented were obtined using the K-Kn method nd results for individul strins were confirmed by the other tests (T-Recs nd drug-in-drug); ll three methods gve closely comprble frequencies. In the K-Kn Figure 4. Three genomic loci ssyed for dupliction ccumultion. The rgh locus is between direct-order copies of the 6.5-kb rrn genes. The lc locus lies between repeted direct-order copies of the IS3 element (131 kb prt) on low copy conjugtive plsmid (F9 128 ), whose trnsfer repliction origin is responsible for intense recombintion on the plsmid (Seifert nd Porter 1984b; Syvnen et l. 1986; Crter et l. 1992). The chromosoml pyrd locus is not flnked by mjor repets nd is likely to be typicl of most regions of the chromosome.

1082 A. B. Rems et l. Figure 5. Accumultion of duplictions in long-term cultures. Cultures initited with single hploid cell grew in LB with constnt dilution to mintin the culture in midlog phse. The dupliction frequency ws determined using the K-Kn ssy. Circles re for reca 1 hploid prent nd tringles re for reca mutnt prent. Since the dupliction ssy required lrge popultion size, the erliest time point t which n ssy could be mde ws 33 genertions. The plotted lines describe the dupliction ccumultion predicted by spredsheet simultion using the mesured D/H (t 33 genertions) nd the mesured fitness cost nd dupliction loss rte. Mesurement of these vlues is described in the text. method, ech tested strin hs copy of the lci-z region t the test site (inserted t rgh or pyrd or present on F9 128 lc). The lci gene ws replced by knmycin resistnce determinnt lcking both promoter nd n initition codon (K), leving cells phenotypiclly Kn S nd Lc 1 (expressed constitutively). A dupliction of this site ws identified by Red-medited trnsformtion in which n introduced single DNA strnd provided promoter nd initition codon for K nd thereby conferred drug resistnce (Kn R ). Inheritnce of the promoter frgment lso generted deletion extending into the lcz gene, cusing Lc phenotype. In

Dupliction Polymorphism 1083 identifying duplictions, hploid K recipient cells (Kn S, Lc 1 ) gve rise to Kn R Lc trnsformnts. Recipient cells with dupliction of the tested region gve rise to Kn R trnsformnts tht remin Lc 1 by virtue of their second copy of the region: (Kn R Lc ) (Kn S Lc 1 ). This method cn be used in either reca 1 nd reca strins since recombintion medited by the Red functions of phge lmbd (recombineering) is independent of RecA (Court et l. 2002). The K-Kn ssy required tht strins crry genes for the Red recombintion system of phge lmbd. These genes were repressed during growth of the popultion nd were induced only immeditely before dupliction trpping. To be certin tht these Red functions did not contribute to dupliction formtion or loss (especilly in reca mutnt strins), mesured dupliction frequencies were confirmed using trnsduction-bsed dupliction ssy, T-Recs (see ppendix ). In the T-Recs ssy, n otherwise wild-type reca mutnt is grown nonselectively to llow ccumultion of duplictions of ny of the test loci (pyrd 1, rgh 1,orlc 1 ). Cells with dupliction were identified by P22- medited trnsduction cross tht introduced chlormphenicol resistnce gene (Cm R ) insertion into the locus being tested (pyrdtcmr, rght CmR, or lct CmR). The element crrying Cm R (T-Recs) lso includes constitutively expressed functionl reca 1 gene. The RecA function produced from the trnsduced frgment llows the donor llele to recombine with the chromosome of the reca mutnt recipient. Recipient (reca) cells with no dupliction gve Cm R trnsductnts tht were defective for the function of the test locus (Pyr, Arg,or Lc ). Recipient cells with dupliction formed during pregrowth gve Cm R trnsductnts tht inherited the donor insertion llele in one copy of the test locus but retined functionl test llele in the other. Using this ssy, duplictions were found to ccumulte t the sme rte s tht inferred using the K-Kn ssy. A third ssy, (drug-in-drug; ppendix ) lso confirmed the results in Figure 5. The stedy-stte frequency of the lc duplictions ws lso independently confirmed by quntittive PCR mplifiction of the predominnt dupliction junction frgment (between IS3C nd IS3A). Use of the T-Recs ssy gve further evidence tht Red functions (used in the K-Kn ssy) do not contribute to dupliction formtion. The dupliction ccumultion mesured by T-Recs fter 30 genertions of reca strin ws the sme with nd without the Red plsmid. (A brief induction of Red functions t 42, s used in the K-Kn ssy, cused no increse in the ssyed dupliction frequency.) Thus Red functions pper to mke no contribution to dupliction formtion even when fully induced. The rte of dupliction loss ws lso not ffected by Red functions. Dupliction strins with reca muttion showed essentilly no dupliction loss (described below) nd this negligible rte ws not incresed by plsmid providing the repressed Red genes (lmbd recombintion functions) even fter 15-min period of induced Red expression. Thus the Red functions crried by strins for the K-Kn ssy did not contribute to either formtion or loss of duplictions. The bility of Red to contribute to efficient trnsformtion but not internl rerrngement is expected since Red-medited trnsformtion seems to require high concentrtion of input frgments (Court et l. 2002). Estimting the dupliction loss rte (k L ): Independent duplictions of prticulr locus cn include different mounts of mteril within which n exchnge cn led to loss (Figures 1 nd 3). Therefore, one might expect different duplictions to show different loss rtes. These rtes were mesured for series of independent duplictions of ech locus, trpped using the K-Kn trnsformtion method described bove (see lso ppendix ). Five single colonies of ech dupliction strin (Kn R,Lc 1 ) were tested for the loss of either the Lc 1 or the Kn R phenotype during colony formtion growth (23 genertions) on nonselective LB pltes (see mterils nd methods). After growth, the entire colony ws plugged nd used to estimte totl cell number nd frequency of ccumulted hploid cells. The frequency ws used to estimte loss rte (k L ), using spredsheet simultion (see mterils nd methods). The mgnitude of the correction mde by this simultion vries with the growth rte of ech prticulr dupliction tested, which cn be very different even for two duplictions of the sme locus. In ech cse the fitness defect of the prticulr strin ws used in estimting its loss rte. In generl, fitness costs were gretest for duplictions of rgh where the uncorrected k L vlue ws two- to sevenfold higher thn the vlue obtined tking fitness into considertion. For ll other loci the corrections were less thn twofold. The duplictions described in Figure 6 were isolted nd tested for loss rte in the sme genetic bckground (either reca or reca 1 ), becuse rtes of dupliction loss will be used to ssess the pproch to stedy stte in strins with nd without RecA. The duplictions whose loss rte is shown in Figure 6 ppered to be structurlly similr regrdless of their origin. All duplictions tested show loss rte tht is highly dependent on RecA (see below). Role of homologous recombintion (RecA) in dupliction loss: The homologous recombintion function RecA hs been shown repetedly to be importnt for the exchnge events leding to dupliction loss (Anderson nd Roth 1979). This ws confirmed by the observtion tht duplictions isolted in reca mutnt bckground showed very low loss rte (Figure 6), while those isolted in reca 1 strins showed rther high but vrible loss rte (0.003 0.06/cell/genertion). All of these duplictions, regrdless of their origin (reca 1 or reca ), seem to be qulittively the sme. Tht is, the unstble duplictions isolted in reca 1 strins become

1084 A. B. Rems et l. Figure 6. Estimted rte constnts for dupliction loss (k L ). Dupliction loss ws mesured during growth of single colonies on LB medium. Ech dt point represents one independently isolted dupliction mutnt nd is the medin vlue of five subclones of tht mutnt (SD, 38%). The dt show the vrition between different duplictions. The brs indicte the medin loss rte of the different ssyed dupliction mutnts, followed (in prentheses) by the totl number of duplictions ssyed. stble when reca llele is dded nd conversely stble duplictions rising in reca strins show typicl high loss rte when reca 1 llele is provided. Furthermore, dupliction formtion rtes re similr in reca 1 nd reca bckgrounds (see below). The primry effect of RecA is on dupliction loss. For the chromosoml rgh nd pyrd loci, no cse of dupliction loss ws observed in the bsence of RecA (,10 6 /cell/division). The few cses of reca-independent loss of lc dupliction on F9 128 (9 3 10 4 )my reflect deletions rising on the plsmid, which hs very high rte of genetic rerrngement stimulted by its conjugtive repliction origin (Silver et l. 1980; Seifert nd Porter 1984,b; Tlsty et l. 1984; Crter nd Porter 1991; Crter et l. 1992). Estimting the rte constnt for dupliction formtion (k F ): At erly points in the growth of hploid strins, dupliction frequency is low nd therefore the number of duplictions lost is negligible. This should in principle llow n estimte of dupliction formtion rte bsed on the initil ccumultion rte. However, becuse vilble ssy methods require lrge popultion, the erliest time in the growth period t which frequency could be mesured ws 30 genertions. The mesured frequency of duplictions t this time point (33 genertions) is mrked on the curves in Figure 5. At this point, dupliction frequency estimtes cn be seriously influenced by dupliction loss nd growth rte differences (dupliction fitness cost). To ssess k F, the mesured frequency t 33 genertions ws corrected for effects of dupliction loss nd growth rte differences. The correction ws mde using spredsheet simultion in which virtul hploid cell ws llowed to grow (t rte m H ) nd form duplictions t rte k F (not yet determined). The simultion uses determined growth rtes for hploid nd diploids (see below) nd the estimted loss rte k L (determined bove). The unknown formtion rte (k F ) ws vried in tht simultion until the dupliction frequency (D/H) t 33 genertions equled tht experimentlly determined frequency t this point in the growth period. The k F vlues for the three loci re presented in Figure 5 (see solid curves). By running the simultion pst the 33-genertion point, the pproch to stedystte frequency cn be predicted. Figure 5 shows tht the simultion predicts trjectory nd stedy-stte dupliction frequency tht mtches the experimentlly mesured increse in the frequency of duplictions. This greement suggests tht the four prmeters (k F, k L, m H, nd m D ) re sufficient to explin the observed ccumultion of duplictions. As expected if these four vribles dictte ccumultion, different loci (whose duplictions behve differently) show distinct vlues of k F nd ttin different stedy sttes t distinct rtes. For exmple, lc dupliction frequencies in reca strins reched only 40% of the stedy-stte vlue fter 33 genertions, while rgh dupliction frequencies in reca 1 strins reched 98% of the stedy-stte vlue. The reltionship between the pproch to stedy stte nd the four prmeters lso grees with the nlyticl tretment of dupliction ccumultion seen below. Role of RecA in dupliction formtion: Tble 2 shows verge rtes of dupliction formtion for ech locus nd the effect of RecA. It ws surprising tht dupliction rtes dropped so little in reca mutnt. The reca muttion essentilly elimintes homologous recombintion in otherwise norml strins nd ws seen bove to eliminte dupliction loss t ll three loci (Figure 6). This rises question regrding the mechnisms underlying dupliction formtion. The pyrd locus lcks mjor flnking sequence repets nd my be typicl of most chromosoml loci; pyrd shows the lowest dupliction formtion rte nd rther low dependence on RecA. The lc locus (with flnking 1.3-kb repets) shows nerly 100-fold higher dupliction rte (in reca 1 ), but only slightly higher dependence on RecA. The high dupliction rte for lc my reflect both the presence of the IS3 repets nd the stimultory effect on recombintion of DNA ends generted by the plsmid conjugtive trnsfer origin. Most surprising is the behvior of the rgh locus, which lies in the midst of four direct-order rrn repets (ech 6.5 kb). The dupliction rte for rgh is 1000-fold higher thn tht of pyrd nd but shows essentilly no dependence on RecA. The high rte of rgh dupliction is consistent with the presence of lrge, directorder rrn repets (see Figure 4), but the lck of dependence on recombintion suggests tht ex-

Dupliction Polymorphism 1085 TABLE 2 Dupliction formtion rte (k F ) bsed on frequency fter 33 genertions Duplicted locus reca 1 reca RecA effect on dupliction formtion (Rec 1 /Rec ) pyrd 4.6 3 10 6 1.6 3 10 6 2.9 lcz 30.0 3 10 5 2.7 3 10 5 11.1 rgh 2.0 3 10 3 1.9 3 10 3 1.1 This k F vlue is bsed on the dupliction frequency fter 33 genertions corrected for growth rte differences nd k L using spredsheet simultion. chnges between rrn loci my often rise without stndrd recombintion. Attining stedy-stte dupliction frequency in the bsence of RecA: Initilly it ws expected tht the stedy-stte dupliction frequency would be dictted entirely by reltive rtes of formtion nd loss (k F /k L ). This expecttion is contrdicted by the effects of reca muttion seen in Figure 5 (see dt points for reca mutnt strins). The reca muttion hd minor effect on formtion (k F ) nd essentilly eliminted loss (k L ). If the initil expecttion hd been correct, then reca muttion should hve cused gret increse in the stedy-stte dupliction frequency (k F /k L ) by severely reducing k L nd modestly reducing k F. The reverse is seen in Figure 5: the reca muttion reduced stedy-stte dupliction frequency for ll tested loci. This suggested tht nother fctor must contribute to estblishment of the stedy stte. Role of fitness cost in pproch to stedy stte: The fitness cost of duplictions ppers to be the missing contributor to ttinment of stedy stte. If dupliction strins grew more slowly thn the hploid prent, then the frequency of duplictions could rech stedy stte, even when there is essentilly no reversion (k L ¼ 0). This is described in Figure 2 bove. Stedy-stte frequency cn be reched when increses cused by de novo dupliction formtion counterblnce decreses cused by slower growth of the dupliction strins. Under conditions tht llow both reversion rte nd fitness cost to contribute, stedy stte is expected when the formtion rte blnces the combined effects of dupliction loss rte nd growth deficit. The pproximte reltionship is tht stedy-stte frequency (D/H) ¼ formtion rte/ (loss rte 1 fitness cost). To test this, duplictions were first tested for fitness cost. Fitness costs of duplictions: Growth rtes were mesured for independently formed duplictions of ech of the three genomic regions being tested (see Figure 7). Some of those duplictions were isolted in reca mutnt bckground nd others in reca 1 prent strin s described bove. All those from the reca 1 strin received reca muttion just before the growth rte test to prevent dupliction loss during growth. Without this stbiliztion, hploid segregnts (with higher growth rtes) quickly overgrew cultures nd obscured the effect of the dupliction on growth rte. In estimting dupliction fitness effects in reca mutnts, it is ssumed tht, even though the reca muttion reduced growth rte slightly, the rtio of growth rtes for diploid nd hploid strins ws independent of recombintion. Reltive fitness ws the rtio of the growth rte of dupliction strin (m D ) nd tht of the isogenic hploid prent (m H ). In Figure 7, it cn be seen tht strins with duplictions of lc on the F9 plsmid re nerly s fit s the hploid prent (medin cost is #3%). This might be expected since the plsmid crries no essentil genes. Of the 17 lc dupliction strins tested, 14 extend between the IS3A nd IS3C elements nd include 131 kb of F9 plsmid mteril (see Figure 3); these duplictions hve fitness cost ner 3%. Three lc duplictions contined only 20 kb of mteril (1 ws isolted in reca 1 nd 2 in reca prents); these smller duplictions hve lower (1%) fitness cost. Duplictions of both types hve been described previously (Kugelberg et l. 2006). The low fitness cost of smll lc duplictions helps explin why cells with such dupliction quickly mplify nd show rpid growth under selection for incresed lcz ctivity (Cirns nd Foster 1991; Roth et l. 2006). Duplictions of the pyrd locus show wide rnge of fitness costs. This is likely to reflect vriety of endpoints nd sizes, but the exct extent of these duplicted regions hs not been determined. The rgh duplictions form between vrious pirs of the flnking rrn loci, two on ech side of rgh (Figure 3). These duplictions vry in fitness cost nd seem to fll into two generl clsses, which my reflect repeted use of these few prominent rrn endpoint sequences. The regions between rrn loci encode vriety of highly expressed housekeeping proteins, which my explin the high fitness cost of some of these duplictions. Cn fitness cost nd reltive rtes of formtion nd loss explin stedy-stte dupliction frequencies? Dupliction frequency seems to ttin stedy stte due to the combined effects of formtion nd loss rtes nd the fitness cost of dupliction reltive to its hploid prent. The generl contribution of the two effects to stedy

1086 A. B. Rems et l. Figure 7. Fitness costs of duplictions. Reltive fitness is defined s the rtio of the growth rtes of dupliction strin to tht of its hploid prent (m D /m H ). All tested duplictions were isolted by the K-Kn selection (ppendix ) nd hd dupliction with Kn R Lc in one copy nd Kn S Lc 1 in the other. Ech point represents one independently isolted dupliction mutnt, whose presented reltive growth rte is the verge of three or more determintions (SD, 0.02). The medin for ech group of duplictions is represented by horizontl br; the totl number of different duplictions ssyed is in prentheses. Some duplictions rose in reca 1 cells nd others in reca mutnts. To minimize dupliction loss during growth, rte mesurements were mde fter ddition of reca muttion, which reduces growth slightly but is ssumed to hve no effect on the reltive fitness of hploid nd dupliction-bering strins. stte is (D/H) ¼ formtion rte/(loss rte 1 fitness cost). An exct mthemticl description of the ccumultion process is derived in ppendix b. The expression in Eqution 1 describes the reltionship between the four vribles mesured (k F, k L, m H, nd m D ) nd the stedystte rtio R ¼ D/H, where D is the number of dupliction-bering cells nd H is the number of hploid prent cells in culture, where nd ¼ m H m D k F k L b ¼ ¼ ½=bŠR 1 ½1 ŠR 1 R 2 ; ð1þ m Hk F ðm H m D Þ is the contribution of formtion=loss is the contribution of fitness cost: The term represents the rtio of new duplictions formed to duplictions lost by segregtion nd thus describes the contribution of formtion nd loss to the stedy-stte R vlue. This rte depends on growth becuse it is ssumed tht formtion nd loss events occur during chromosome repliction nd should be expressed per cell division. The correction for different growth rtes is reltively smll in most cses. The term b reltes the formtion rte to dilution rte ttributble to fitness costs. Thus, it represents the contribution of fitness cost to the stedy-stte vlue. To visulize how formtion/loss rtes nd fitness cost contribute to the stedy stte, one cn plot log vs. log b nd exmine stedy stte nd time to hlf stedy stte, t 1/2, expected for ech point in this design spce (Svgeu et l. 2009). In Figure 8, the vlue of log R is represented by color s described t the right. The sensitivity of R to chnges in nd b defines four regions (regions A D in Figure 8). In region A, the vlue of R depends hevily on reltive fitness (m D /m H ) nd less on formtion/loss rtes (on chnges in b nd less on chnges in ). In region B, R depends hevily on formtion/loss rtes () but less on fitness cost (b). Also shown in Figure 8 re the positions in this spce of the loci ssyed here. The experimentl vlues for the rgh region in RecA 1 strin plce the point in region A of the design spce, showing tht fitness cost (b) is the mjor determinnt of the stedy-stte dupliction frequency. Tht is, the stedy stte rises s fitness cost decreses, but stedy stte is not sensitive to chnges in the formtion/loss rtes. The pyrd nd lc regions in RecA 1 strins re locted in regions A nd B, respectively, but ner the boundry between these regions. Thus, these strins exhibit mixed contribution from both fitness cost (b) nd formtion/loss rtes (). In reca mutnt strins (open symbols), ll regions tested fll within region A, where R is dictted by fitness cost (b). These results suggest tht our initil focus exclusively on formtion nd loss rtes ws incorrect fitness cost is mjor fctor in determining stedy-stte dupliction frequency. The time required to rech stedy stte cn lso be plotted s het mp in the b design spce (see Figure 9). Agin ll loci tested hve vlues tht fll into region A or B. In region A, time required to rech hlf stedy stte (t 1/2 ) increses s fitness cost (b) is reduced. Tht is, the time required to rech stedy stte is shortened by incresing fitness cost, but is not ffected by the rtio of formtion nd loss rtes () for the regions studied here. In region B, chnges in fitness cost (b) hve little effect on t 1/2, nd the pproch to stedy stte is ccelerted primrily by the fitness cost of the dupliction. Tht is, the pproch to stedy stte is ccelerted primrily by decreses in the rtio of formtion rte (k F ) to loss rte (k L ). The vlues for the prmeters summrized in Figures 8 nd 9 re presented in Tble 3 with the R vlues nd the time to hlf stedy stte (t 1/2 ) predicted by the mthemticl nlysis (ppendix b). Tble 3 compres vlues of R nd t 1/2 estimted in vrious wys. This greement between these pproches shows tht the mthemticl description fits well with unprocessed experimentl dt nd with pproximtions mde using the spredsheet simultion. A very similr dependency

Dupliction Polymorphism 1087 Figure 8. Design spce digrm for visulizing effects of fitness nd formtion/loss on stedy-stte dupliction frequency. of stedy-stte dupliction frequency on the four prmeters ws obtined when the process of dupliction ccumultion ws modeled using Monte Crlo simultion (dt not shown). The concordnce of severl pproches suggests tht the behvior of dupliction frequency is well explined by the combintion of reltive fitness (m D /m H ) nd reltive rtes of formtion nd loss (k F /k L ). DISCUSSION The results presented here revel severl unexpected properties of dupliction muttions: 1. Dupliction formtion (k F ) depends only wekly on recombintion, even when junctions reflect exchnges between rther extensive seprted sequence repets (1 6.5 kb). This observtion suggests tht mechnisms other thn homologous recombintion cn underlie dupliction formtion (even when long repets re involved). We suggest tht this occurs by stepwise process initited by tndem inversion dupliction, in which deletion events generte the junctions found in the finl simple dupliction (Kugelberg et l. 2006; E. Kugelberg, unpublished results). 2. Dupliction loss rte (k L ) depends hevily on recombintion, probbly becuse exchnges occur between very long sequence repets (.40 kb). Loss is frequent becuse of the length of these sequences nd the use of n ctive system of homologous recombintion. Longer duplictions re expected to hve higher loss rtes since they present lrger trgets for recombintion. 3. Duplictions hve surprisingly high, but locusdependent fitness cost. Most previous discussion of bcteril dupliction frequency in rich medium hs ssumed neutrlity, but the stedy sttes described Figure 9. Design spce digrm for visulizing effects of reltive fitness (b) nd formtion/loss rtes () on time required for dupliction frequency (R) to rech one-hlf of the finl stedy stte (t 1/2 ). here depend hevily on fitness costs s well s high formtion rtes. The fitness cost of dupliction hs been recognized for higher orgnisms since the dvent of genome sequences (Emerson et l. 2008; Conrd et l. 2009). The vribility of cost for different duplictions suggests tht cost is due to toxicity of prticulr gene products when overproduced or when present in n inpproprite rtio to the levels of other products. One my expect tht in generl lrger duplictions will hve higher costs simply becuse they re more likely to include toxic genes. We cnnot exclude the possibility tht the growth deficits mesured here for duplictions re due to occsionl lethlity of dupliction-bering strins. The fitness costs were estimted in strins with reca muttion, which is known to cuse some cell deth. It is possible tht the reca defect cuses more frequent cell deth in dupliction cells thn in hploid cells. 4. Dupliction frequency comes to stedy stte dictted by high formtion rte (k F ) blnced in prt by high loss rte (k L ), but primrily by the high fitness cost of duplictions. During the period of 30 genertions required for single cell to form sturted 1-ml culture or lrge colony on solid medium (10 9 cells), dupliction frequency for prticulr locus reches.40% of the stedy-stte level. Duplictions cn form between extensive repets without RecA: Duplictions form frequently even without the benefit of recombintion (RecA). This is true even for regions between firly extensive repets such s the chromosoml rgh locus flnked by repeted rrn loci (6.5 kb) nd the lc locus flnked by IS3 sequences (1.3 kb) on plsmid F9 128. At lest some duplictions of these

TABLE 3 Comprison of observed nd nlyticl vlues for dupliction stedy stte Locus tested RecA genotype R33 ¼ D/H rtio fter 33 genertions Loss rte (310 4 ) kl (310 3 ) b Reltive fitness m D /m H (verge) c Formtion rte kf (310 5 ) Stedy-stte frequency R ¼ D/H (310 4 ) Time to one-hlf stedy stte (t1/2) (in genertions) Rw d Simultion e Anlytic f Rw g Simultion h Anlytic i Rw j Simultion k Anlytic l pyrd reca 1 0.51 14.0 0.96 0.31 0.46 0.39 0.70 0.85 0.72 23 25 18.7 reca 0.17,0.001 0.94 0.10 0.16 0.14 0.20 0.27 0.23 20 23 16.7 F9 lc reca 1 28.7 44.0 0.97 17.4 29.9 25.7 38.7 40.3 35.4 22 19 13.8 reca 3.5 0.9 0.97 2.12 2.69 2.13 7.9 8.7 6.9 37 45 32.4 rgh reca 1 81.8 8.0 0.77 50.0 198.5 193.0 75.0 83.4 82.4 15 6 4.27 reca 71.0,0.001 0.74 43.4 192.5 184.0 80.0 74.0 71.3 18 6 3.87 b These vlues were determined directly. These k L vlues were determined with the id of spredsheet simultion to correct for growth rte differences. Duplictions described were isolted in either reca or reca 1 s indicted. Reltive fitness ws determined in reca mutnt derivtives using spredsheet to correct for loss rte. For the pyrd nd lc loci, ech presented vlue is the medin vlue for severl independent dupliction strins. Becuse the rgh duplictions fell into two clsses, the verge vlue is presented. d This vlue uses the determined R33 to define n uncorrected initil rte. This kf vlue ws determined using spredsheet simultion. After 33 genertions in the simultion, this kf vlue produces dupliction frequency equl to tht observed fter 33 genertions of growth. This k F vlue ws clculted using the mthemticl description (ppendix b) nd solving for k F using mesured vlues of fitness nd k L with the mesured vlues of R fter 33 genertions. g This vlue of R is determined by direct observtion of the dt points in Figure 5, without ny corrections. h This vlue of R is predicted by the spredsheet simultion using k F tht produces the observed frequency t genertion 33 (see plot in Figure 5). This vlue of R is clculted using the mthemticl description (ppendix b), the rw unmnipulted vlue of R 33, nd the ssyed vlues of k L nd reltive fitness. This is estimted on the bsis of n pproch to the rw stedy-stte frequency nd the rw kf. k This is estimted on the bsis of the trjectory of the spredsheet simultion obtined using the corrected kf nd the mesured reltive fitness. This vlue is clculted using the mthemticl description nd on the bsis of unmnipulted vlues of R 33, k L, nd reltive fitness. c e f i j l 1088 A. B. Rems et l.

Dupliction Polymorphism 1089 regions hve hybrid rrn or IS3 sequences t their junction (dt not shown). A model for RecA-independent formtion of duplictions between lrge nd smll sequence repets will be described in detil elsewhere. In this model, RecA-independent mechnism genertes frequent lrge deleterious structures (symmetricl tndem inversion duplictions, stid) of the form ABCDD9C9B9A9ABCE (where ech letter represents block of sequence). These stid structures re usully lost, but cn be stbilized by join point deletions (Kugelberg et l. 2006; E. Kugelberg, unpublished results). Deletions extending between different points in the flnking direct repets remove both inversion junctions nd generte new junction of simple dupliction. Dupliction formtion cn be RecA independent regrdless of the nture of the junction sequences, becuse RecA is not needed for either the initil stid or the deletions tht generte the ultimte dupliction junction. Stble polymorphisms in bcteril popultions: The process by which dupliction frequency reches stedy stte should, in principle, ffect ll muttions. Tht is, muttion frequency increses due to formtion nd drops due to reversion nd counterselection. However, point muttions rise nd revert t very low rtes nd stedy stte is expected only fter extensive growth periods. The stedy-stte dupliction frequency should be viewed s stble genetic polymorphism. Roughly 1% of cells in n unselected popultion hve dupliction of region flnked by the most closely locted rrn loci (rrnc, -A, -B, nd -E) nd 0.1% hve dupliction of ny prticulr gene outside of this region (Anderson nd Roth 1981). About 10% of cells in n unselected popultion hve dupliction of some unspecified chromosoml region. The region of F9 128 tht includes lc is duplicted in 1 in 500 cells nd this polymorphism contributes to the rpid ccumultion of lc 1 revertnts during selective growth of strin crrying lc muttion on F9 128 lc (Roth et l. 2006). The stedy-stte dupliction frequency described here is predicted to lso pply to higher copy number vrints of the ffected locus. Recombintion events tht cuse dupliction loss cn lso generte higher levels of repet mplifiction (Figure 2) nd re expected to occur t the sme rte (10 2 /cell/division). The mthemticl tretment described here cn predict the stedy-stte frequency of cells with higher copy numbers. If one ssumes tht fitness costs of dded copies ccrue by constnt fctor, it is estimted for the lc region of F9 128 tht in n unselected stedy-stte popultion, one cell in 10 6 would hve eight or more lc copies. Thus when 10 8 cells re plted on medium selecting for increses in lc level, 100 of those cells re expected to hve mny lc copies t the moment of plting. Implictions of dupliction polymorphisms for the effect of selection on bcteril popultions: Point Figure 10. Effects of selection nd dupliction remodeling on stedy-stte frequency. The stedy-stte dupliction frequency is determined (t left) by blnce between rte of formtion (k F ) on one hnd nd combined rtes of loss (k L ) nd counterselection on the other hnd. This stedy-stte frequency is expected to increse if conditions chnge so tht some gene in the duplicted region provides fitness increse. The frequency increse is expected even if the benefit is smller thn the cost, becuse cost is offset by formtion. Frequency is lso expected to increse if the size of the repeted unit is reduced by deletions, which should both reduce fitness cost nd the rte of dupliction loss k L. muttions tht increse the level of prticulr enzymtic ctivity re rre, rising t perhps 10 9 /cell/ division (ffecting few bse pirs of the relevnt gene). Yet increses in the level of tht ctivity re esily chieved by the copy number chnges discussed here, whose formtion rtes re 1000- to 10 6 -fold higher. In ddition, the polymorphisms described bove provide high frequency of potentilly beneficil duplictions to even very smll popultion. Such vrints re lredy present t substntil frequency when selective conditions re first imposed. Selection detects smll improvements nd cuses n exponentil increse in mutnt frequency. Since further copy number increses occur t such high rte (order of 10 2 /cell/division), mplifiction is quickly fvored s the popultion expnds. The fitness cost of duplictions, noted here, does not prevent these selective effects. As digrmmed in Figure 10, positive selection for incresed copy number cuses n increse in dupliction frequency even if the benefit provided under selection is less thn the underlying fitness cost of the dupliction. Tht is, dupliction stedy-stte frequency prior to selection is blnce between net formtion rte (reltive to loss) nd fitness cost. When selective conditions llow extr copies to provide growth improvement, dupliction frequency is expected to increse (due to the offset of generl cost). Further increses in the stedy-stte frequency cn be enbled by deletions tht reduce the size of the duplicted region, but leve the positively selected gene. Deletions shorten the repeted unit nd thereby reduce both fitness cost

1090 A. B. Rems et l. nd the rte of dupliction loss. This selective remodeling of duplictions hs been described previously (Kugelberg et l. 2006). We thnk Alln Cmpbell, who pointed out tht our initil thinking bout stedy-stte dupliction frequency ws numericlly unsnitry. We believe tht inclusion of fitness cost will stisfy his concerns. A.B.R. thnks Rosemry Redfield for instruction in use of spredsheet simultions in the study of bcteril popultions. J.R.R. thnks Richrd Lewontin for sserting (30 yers go) tht bcteri re boring subjects for popultion genetics becuse there cn be no stble polymorphisms without sex nd diploidy. His comment hs been worrisome, ever since, but perhps the stedy sttes described here will help fill the bcteril polymorphism gp. This work ws supported by Ntionl Institutes of Helth grnt GM27068. LITERATURE CITED Anderson, P., nd J. Roth, 1981 Spontneous tndem genetic duplictions in Slmonell typhimurium rise by unequl recombintion between rrna (rrn) cistrons. Proc. Ntl. Acd. Sci. USA 78: 3113 3117. Anderson, R. P., nd J. R. Roth, 1977 Tndem genetic duplictions in phge nd bcteri. Annu. Rev. Microbiol. 31: 473 505. Anderson, R. P., nd J. R. Roth, 1979 Gene dupliction in bcteri: ltertion of gene dosge by sister-chromosome exchnges. Cold Spring Hrbor Symp. Qunt. Biol. 43(Pt. 2): 1083 1087. Andersson, D. I., E. S. Slecht nd J. R. Roth, 1998 Evidence tht gene mplifiction underlies dptive mutbility of the bcteril lc operon. Science 282: 1133 1135. Bchellier, S., J. M. Clement nd M. Hofnung, 1999 Short plindromic repetitive DNA elements in enterobcteri: survey. Res. Microbiol. 150: 627 639. Bergthorsson, U., D. I. Andersson nd J. R. Roth, 2007 Ohno s dilemm: evolution of new genes under continuous selection. Proc. Ntl. Acd. Sci. USA 104: 17004 17009. Cirns, J., nd P. L. Foster, 1991 Adptive reversion of frmeshift muttion in Escherichi coli. Genetics 128: 695 701. Crter, J. R., nd R. D. Porter, 1991 try nd tri re required for orit-dependent enhnced recombintion between lc-contining plsmids nd lmbd plc5. J. Bcteriol. 173: 1027 1034. Crter, J. R., D. R. Ptel nd R. D. Porter, 1992 The role of orit in tr-dependent enhnced recombintion between mini-f-lcorit nd lmbd plc5. Genet. Res. 59: 157 165. Conrd, D. F., D. Pinto, R. Redon, L. Feuk, O. Gokcumen et l., 2009 Origins nd functionl impct of copy number vrition in the humn genome. Nture October 7, 2009 doi: 10.1038/ nture08516. Court, D. L., J. A. Switzke nd L. C. Thomson, 2002 Genetic engineering using homologous recombintion. Annu. Rev. Genet. 36: 361 388. Emerson, J. J., M. Crdoso-Moreir, J. O. Borevitz nd M. Long, 2008 Nturl selection shpes genome-wide ptterns of copynumber polymorphism in Drosophil melnogster. Science 320: 1629 1631. Hendrickson, H., E. S. Slecht,U.Bergthorsson,D.I.Andersson nd J. R. Roth, 2002 Amplifiction-mutgenesis: evidence tht directed dptive muttion nd generl hypermutbility result from growth with selected gene mplifiction. Proc. Ntl. Acd. Sci. USA 99: 2164 2169. Kidd, J. M., G. M. Cooper, W.F.Donhue, H.S.Hyden,N.Smps et l., 2008 Mpping nd sequencing of structurl vrition from eight humn genomes. Nture 453: 56 64. APPENDIX A: ASSAYS FOR DUPLICATION FREQUENCY Kofoid, E., U. Bergthorsson, E. S. Slecht nd J. R. Roth, 2003 Formtion of n F9 plsmid by recombintion between imperfectly repeted chromosoml Rep sequences: closer look t n old friend (F9(128) pro lc). J. Bcteriol. 185: 660 663. Korbel, J. O., A. E. Urbn, J. P. Affourtit, B. Godwin, F. Grubert et l., 2007 Pired-end mpping revels extensive structurl vrition in the humn genome. Science 318: 420 426. Kugelberg, E., E. Kofoid, A. B. Rems, D. I. Andersson nd J. R. Roth, 2006 Multiple pthwys of selected gene mplifiction during dptive muttion. Proc. Ntl. Acd. Sci. USA 103: 17319 17324. Roth,J.R.,E.Kugelberg,A.B.Rems,E.Kofoid ndd.i.andersson, 2006 Origin of muttions under selection: the dptive muttion controversy. Annu. Rev. Microbiol. 60: 477 501. Sndegren, L., nd D. I. Andersson, 2009 Bcteril gene mplifiction: implictions for the evolution of ntibiotic resistnce. Nt. Rev. Microbiol. 7: 578 588. Svgeu, M. A., P. M. B. M. Coelho, R.Fsni, D.Toll nd A. Slvdor, 2009 Phenotypes nd tolernces in the design spce of biochemicl systems. Proc. Ntl. Acd. Sci. USA 106: 6435 6440. Seifert, H. S., nd R. D. Porter, 1984 Enhnced recombintion between lmbd plc5 nd F42lc: identifiction of cis- nd trnscting fctors. Proc. Ntl. Acd. Sci. USA 81: 7500 7504. Seifert, H. S., nd R. D. Porter, 1984b Enhnced recombintion between lmbd plc5 nd mini-f-lc: the tr regulon is required for recombintion enhncement. Mol. Gen. Genet. 193: 269 274. Shrp, A. J., D. P. Locke, S. D. McGrth, Z. Cheng, J. A. Biley et l., 2005 Segmentl duplictions nd copy-number vrition in the humn genome. Am. J. Hum. Genet. 77: 78 88. Silver, L., M. Chndler, H. E. Lne nd L. Cro, 1980 Production of extrchromosoml r-determinnt circles from integrted R100.1: involvement of the E. coli recombintion system. Mol. Gen. Genet. 179: 565 571. Slecht, E. S., K. L. Bunny, E. Kugelberg, E. Kofoid, D. I. Andersson et l., 2003 Adptive muttion: generl mutgenesis is not progrmmed response to stress but results from rre complifiction of dinb with lc. Proc. Ntl. Acd. Sci. USA 100: 12847 12852. Sonti, R. V., nd J. R. Roth, 1989 Role of gene duplictions in the dpttion of Slmonell typhimurium to growth on limiting crbon sources. Genetics 123: 19 28. Sun, S., O. G. Berg, J. R. Roth nd D. I. Andersson, 2009 Contribution of gene mplifiction to evolution of incresed ntibiotic resistnce in Slmonell typhimurium. Genetics 182: 1183 1195. Syvnen, M., J. D. Hopkins, T. J. T. Griffin, T. Y. Ling, K. Ippen-Ihler et l., 1986 Stimultion of precise excision nd recombintion by conjugl proficient F9 plsmids. Mol. Gen. Genet. 203: 1 7. Tlsty, T. D., A. M. Albertini nd J. H. Miller, 1984 Gene mplifiction in the lc region of E. coli. Cell 37: 217 224. Tlsty, T. D., B. H. Mrgolin nd K. Lum, 1989 Differences in the rtes of gene mplifiction in nontumorigenic nd tumorigenic cell lines s mesured by Luri-Delbruck fluctution nlysis. Proc. Ntl. Acd. Sci. USA 86: 9441 9445. Willims, P. A., R. J. Ingebretsen nd R. J. Dwson, 2006 14.6 mt ELF mgnetic field exposure yields no DNA breks in model system Slmonell, but provides evidence of het stress protection. Bioelectromgnetics 27: 445 450. Yu, D.,J.A. Switzke, H.EllisndD.L.Court,2003 Recombineering with overlpping single-strnded DNA oligonucleotides: testing recombintion intermedite. Proc. Ntl. Acd. Sci. USA 100: 7207 7212. Communicting editor: M. Long Three distinct ssy methods were used to determine the frequency of cells in popultion tht crry dupliction of prticulr locus. Ech of these ssys cn be used in strins tht re deficient in recombintion bility (reca). While most reported dt were obtined using the K-Kn ssy, congruent results were obtined with the others. The T-Recs ssy: This ssy is n improved version of n erlier method in which trnsductionl cross trps preexisting duplictions in the recipient cell popultion

Dupliction Polymorphism 1091 (Anderson nd Roth 1981). In the modified version, the Tn10 is replced by vrint element (T-Recs) tht includes reca 1 gene nd chlormphenicol resistnce (Cm R ) cssette. The reca 1 gene is constitutively expressed due to point muttion in its opertor region. An inserted T-Recs element cn be trnsduced into recipient strin lcking RecA function. The trnsduced frgment crrying the T-Recs element expresses the RecA protein required for recombintion with the chromosome. Thus reca mutnt popultion, grown without RecA function, cn be ssyed for ccumulted duplictions by selecting Cm R of T-Recs nd scoring the frction of trnsductnts tht retin both the functionl recipient llele nd the defective donor llele. Recombintion bility is restored only during the period of the ssy. The chromosoml genes nlyzed for dupliction formtion re ones required for growth on miniml medium (rgh nd pyrd). The plsmid gene lcz is required for growth on lctose. The donor strins crried T-Recs (Cm R, RecA 1 ) element inserted into the ssy locus (e.g., rgh, pyrd, orlcz). Trnsductnts resistnt to chlormphenicol (Cm R ) re selected on LB chlormphenicol medium (see Figure A1). Duplictions were detected by replic printing the trnsduction pltes to miniml chlormphenicol medium. Most recipient cells re hploid for the ssyed locus nd inherit with Cm R the growth defect of the donor (Arg, Pyr, or Lc ). Recipient cells with dupliction of the ssyed locus inherit Cm R but remin phenotypiclly Arg 1, Pyr 1, or Lc 1 by virtue of the unffected second copy. Such trpped duplictions cn be selectively mintined on minimum medium with chlormphenicol. The Cm R trnsductnts tht do not cquire the metbolic defect of the donor re scored s duplictions. As originlly done, this ssy requires recombintion-proficient (RecA 1 ) cells to support the trnsduction cross tht trps the dupliction, but the use of T-Recs insertions llows the ssy to be done in reca mutnt recipient. To ssy dupliction of lc on plsmid F9 128, the donor crried KnR replcement of lciz nd T-Recs (CmR RecA 1 ) element inserted in lca. Trnsductnts were selected on rich medium contining knmycin nd X-gl. The dupliction frequency ws determined by counting the number of blue (duplictions) nd white colonies (hploids). The T-Recs ssy my slightly underestimte dupliction frequencies, since occsionl trnsductnts into dupliction-bering recipient my inherit the donor Cm R (RecA) element in plce of both copies of the duplicted region. This problem seems to be smll since very similr dupliction frequencies re inferred when this ssy is compred to those described below. The K-Kn ssy: This ssy detects duplictions using trnsformtion cross, medited by the recombintion (Red) functions of phge lmbd (recombineering), provided by plsmid (psim5) tht is crried (with repressed Red genes) by ll strins being ssyed (Court et l. 2002). The strtegy is described in Figure A2. The locus whose dupliction frequency is to be tested is disrupted by insertion of lc operon whose lci gene hs n inserted knmycin-resistnce determinnt lcking promoter (this is designted K to indicte lck of expression). A strin crrying this insertion expresses LcZ constitutively due to its repressor defect. The donted frgment crries promoter for the K gene flnked on one side by sequence within the K determinnt nd on the other side by sequence in the middle of the lcz gene. Inheritnce of the donor promoter genertes Kn R phenotype nd simultneously cretes lcz deletion (s digrmmed in Figure A2). In recipient cell tht is hploid for the test locus, this trnsformtion genertes selectble Kn R trnsformnts with Lc phenotype. In recipient cells with dupliction of the test locus, the recombintion event produces the Kn R lcz deletion in one copy, while the undisturbed other copy provides Lc 1 phenotype. Thus the frction of Lc 1 cells mong the Kn R trnsformnts indictes the frequency of lc duplictions in the recipient popultion. Since Red-medited trnsformtion (recombineering) does not require RecA function, this ssy cn be used in reca mutnt cells. This ssy requires tht the cells being tested crry plsmid encoding the Red functions. These functions re not expressed during growth of the culture nd dupliction ccumultion, but re induced in the course of the trnsformtion cross used to detect duplictions. The Red functions re not expected to contribute to dupliction formtion or loss (even if induced) since induction of these functions does increse dupliction segregtion in reca 1 strins or llow segregtion in strins crrying reca muttion. The frequency of trnsformnts observed following Red induction seems to depend hevily on the high copy number of the introduced frgment. This ssy ws done using synthesized 130-bse single-strnded donor frgment. On the bsis of work of Court nd collegues, these crosses pper to ffect single repliction fork region nd thus re not expected to remove preexisting duplictions s cn occur in the trnsduction ssys bove (Yu et l. 2003). In this ssy, the dupliction is not selectively held, since selection is mde for Kn R nd the Lc phenotype is scored visully. It is possible tht this ssy misses duplictions tht hppen to be lost erly in the growth of the trnsformnt clone, leding to underestimtion of dupliction frequency. This problem is voided in the next ssy, in which both copies re held selectively. The drug-in-drug ssy: This ssy detects nd holds duplictions selectively from the moment of the ssy

1092 A. B. Rems et l. Figure A1. The T-Recs method for detecting duplictions in reca mutnt strin. A trnsduction cross introduces copy of the rgh region with n inserted CmR mrker (T-Recs). This insertion includes reca 1 gene in ddition to the determinnts for chlormphenicol resistnce. When trnsduced into reca mutnt recipient, this element expresses RecA, which cn support the recombintion needed for cquisition of the donor insertion muttion. The dupliction frequency is the frction of Cm R trnsductnts tht remin Arg 1. trnsformtion. As in the K-Kn ssy, duplictions re trpped by Red-medited recombineering cross. The essence of the ssy (see Figure A3) is tht the ssy site crries resistnce determinnt for tetrcycline (Tet R ) tht is inctivted by insertion of knmycin-resistnce determinnt (KnR). Thus the ssy strin is phenotypiclly Kn R Tet S. To ssy dupliction frequency, short single-strnd frgment (80 bses) is introduced tht restores Tet R by excising the Kn R determinnt from the test locus. A hploid recipient cell will become Tet R, Kn S. However, recipient with dupliction of the test locus will cquire Tet R by n exchnge in one llele nd remin Kn R by virtue of the unffected second copy of the test locus. The trnsformed culture ws plted on tetrcycline medium to revel the totl trnsductnt number. After overnight growth these pltes were replic printed to medium with both tetrcycline nd knmycin to revel the frction of the totl tht crry duplictions. The counted duplictions were kept under selection nd hve little opportunity to lose the dupliction before they cn be counted. In some ssys, the recipient Tet R locus ws the TetA, TetR region of Tn10 insertion, nd in others the teta tetr genes were PCR mplified from Tn10 nd inserted directly into the test locus. APPENDIX B: MATHEMATICAL MODELING OF DUPLICATION ACCUMULATION The process of dupliction ccumultion ws described, mking the ssumption tht events of duplic- Figure A2. The K-Kn ssy: Red-medited trnsformtion to detect duplictions in the bsence of RecA. In this ssy, donted frgment includes promoter for the Kn resistnce gene nd flnking homology to the Kn gene on the left nd to the lcz gene on the right. Inheritnce of this frgment converts Kn S Lc 1 llele to Kn R Lc llele. Recipients with two copies of the lc region cquire Kn R (in one copy) without loss of the Lc 1 phenotype (provided by the other copy). tion formtion or loss occur in repliction-dependent wy nd thus will depend on the reltive growth rtes of hploid nd dupliction-bering strins. The bsic model is described in Figure A1. The equtions describing the temporl behvior of the popultions in Figure A1 re the following: dh dt ¼ m H ð1 k F ÞH 1 m D k L D dd ¼ m dt D ð1 k L ÞD 1 m H k F H : These cn be used to chrcterize the dupliction frequency s function of time. By combining the bove equtions, one obtins or dr dt dr dt ¼ dðd=h Þ dt ¼ 1 H dd dt D H 2 dh dt ¼ m H k F 1 m D ð1 k L ÞR m H ð1 k F ÞR m D k L R 2 : At stedy stte, dr=dt ¼ 0, nd the resulting expression cn be written s ¼ R 1 ½1 ŠR 1 R 2 ; b where

Dupliction Polymorphism 1093 Figure A3. The drug-in-drug ssy for detecting duplictions in reca mutnt strins. The locus to be ssyed crries n inserted Tet R gene tht is disrupted by Kn R cssette. The ssy strin crries this compound llele nd plsmid tht encodes the Red functions of phge lmbd (Yu et l. 2003). The ssy consists of trnsformtion cross in which single-strnded donor frgment is electroported into the recipient ssy strin. This frgment displces the Kn R determinnt nd repirs the Tet R determinnt, thereby converting Kn R,Tet S llele into Kn S,Tet R llele. Recipients with dupliction of the test locus gin Tet R (in one copy of the region) while retining the Kn R phenotype (provided by the other copy). ¼ m H m D k F k L nd b ¼ m Hk F ðm H m D Þ : The prmeter describes the contribution of dupliction formtion nd loss k F /k L to the stedy stte if corrected for reltive growth rtes (m H /m D ). The extreme is the sitution on the bottom right in Figure A2 when there is negligible fitness cost nd there is significnt rte of dupliction loss. The prmeter b describes the contribution of fitness cost of dupliction to the stedy stte. The extreme is the sitution on the top left in Figure B1 when k L is ner zero (s in reca mutnt strin) nd there is lrge fitness cost. The stedy-stte dupliction frequency (R ) cn exist in severl different regions of this design spce, depending on the vlues of the two ggregte prmeters nd b. In ech of these regions, the vlue of R shows chrcteristic sensitivity to chnges of the severl prmeters. These regions re described below nd then grphed. In region A, stedy-stte dupliction frequency depends most hevily on fitness cost: R b ¼ m Hk F ðm H m D Þ : The boundries of the region in which this form of R is dominnt re Figure B1. The digrmmed process shows the number of hploid cells (H) increses by growth (m H ) nd when dupliction-bering cell loses its dupliction (m D k L ). Hploid cells cn be lost when dupliction rises (m H k F ). Dupliction cell number (D) increses by growth (m D ) nd when dupliction forms in hploid cell (m H k F ). Duplictions cn be lost by reversion (i.e. recombintion between repets) s digrmmed in Figure 2 (m D k L ). The dupliction frequency (R ¼ D/H) rises to stedy stte (R ) t which point the difference between dupliction formtion nd loss is blnced by the differences between diploid nd hploid growth rtes. b, when, 0:382 1 p b, ffiffiffi when 0:382, #1 b, p 1 1 ffiffiffi when. 1: In region B, stedy-stte dupliction frequency depends most hevily on reltive rtes of formtion nd loss: R ð1 Þ ¼ ðm H=m D Þðk F =k L Þ ½1 ðm H =m D Þðk F =k L ÞŠ : The boundries of the region in which this form is dominnt re, 0:382 nd b. 1 : In region C, stedy-stte dupliction frequency gin depends most hevily on reltive rtes of formtion nd loss: R ð 1Þ ¼ m H k F 1 : m D k L The boundry in this cse is b. p 1 ffiffiffi when. 1:27: Finlly, in region D, stedy-stte dupliction frequency depends on yet different form of expression involving the ggregte prmeter,

1094 A. B. Rems et l. R ¼ ffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffi p m ¼ H k F ; m D k L nd the boundries re given by Figure B2. Grphicl representtion of the design spce (Svgeu et l. 2009) for the model in Figure A1. The horizontl xis represents the contribution of fitness cost (proportionl to 1/b) to the stedy-stte dupliction frequency. The verticl xis represents the contribution of dupliction formtion nd loss (proportionl to ) to the stedy-stte dupliction frequency. The boundries between regions in which the stedy-stte dupliction frequency R is dominted by different forms of the solution re shown s hevy blck lines. 1 1 ffiffiffi p, b, p 1 ffiffiffi when. 1:27 p 1 1 ffiffiffi, b when 1,, 1:27 p b. ffiffiffi when 0:382,, 1: The four regions re digrmmed in Figure B2. In the body of this rticle, Figure B2 is repeted with R vlues represented in het mp nd the position of ech locus studied here indicted in the plot. This grphicl method hs been described previously (Svgeu et l. 2009).