The effect of transcription and translation initiation frequencies on the. stochastic fluctuations in prokaryotic gene expression.

JBC Papers in Press. Published on November 2, 2000 as Manuscript M006264200 The effect of transcription and translation initiation frequencies on the stochastic fluctuations in prokaryotic gene expression. Andrzej M. Kierzek *,1, Jolanta Zaim 1, Piotr Zielenkiewicz 1,2 1 Institute of Biochemistry and Biophysics Polish Academy of Sciences Pawinskiego 5a 02-106 Warszawa POLAND 2 Institute of Experimental Plant Biology Warsaw University Pawinskiego 5a 02-106 Warszawa POLAND Running title: Stochastic effects in prokaryotic gene expression. * Corresponding author: e-mail: andrzejk@ibbrain.ibb.waw.pl tel: (48 22) 658 47 03 fax: (48 39) 12 16 23 Copyright 2000 by The American Society for Biochemistry and Molecular Biology, Inc.

SUMMARY The kinetics of prokaryotic gene expression has been modelled by the Monte Carlo computer simulation algorithm of Gillespie which allowed to study random fluctuations in the number of protein molecules during gene expression. The model applied to the simulation of the LacZ gene expression is in a good agreement with experimental data. The influence of the frequencies of transcription and translation initiation on the random fluctuations in gene expression has been studied in a number of simulations in which promoter and ribosome binding site effectiveness has been changed in the range of values reported for various prokaryotic genes. We show that the genes expressed from strong promoters produce the protein evenly, with the rate that does not vary significantly among cells. The genes with very weak promoters express the protein in "bursts" occurring at random time intervals. Therefore, if the low level in gene expression results from the low frequency of transcription initiation, huge fluctuations arise. In contrast, the protein can be produced with the low and uniform rate if the gene has a strong promoter and slow rate of ribosome binding (weak ribosome binding site). The implications of these findings for the expression of regulatory proteins are discussed. 2

INTRODUCTION Stochastic fluctuations in gene expression have drawn attention of several research groups. It has been postulated (1, 2) and observed (3) that proteins are produced in short bursts occurring at random time intervals rather than in a continuos manner. This expression pattern arises from the fact that elementary processes such as polymerase binding and open complex formation involve small number of molecules and therefore show a wide distribution of reaction times. The fluctuations in a number of protein molecules resulting from stochastic gene expression are of special importance if the concentration of the protein constitutes the message that regulates expression of another gene. In this case one may expect that the activation pattern of the regulated gene is also random, to some extent. The stochastic nature of gene expression and other biochemical processes involving small numbers of molecules may explain the variations observed in isogenic populations of bacterial cells. Examples are individual chemotactic responses (4), the distribution of generation times observed in Escherichia coli cells (5) and the individual cell responses observed during induction of lactose (6) and arabinose (7) operons at subsaturating inducer concentrations. Stochastic effects have also been observed in an engineered genetic networks in Escherichia coli (8). Taking into account the stochastic phenomena described above it is clear that the cell must contain various mechanisms that ensure deterministic regulation of cellular processes. The check points in the eukaryotic cell cycle regulation (9) are examples of such a mechanism. Computer simulations implementing various Monte Carlo algorithms proved to be useful in studying stochastic processes in biochemical reaction networks. This is especially true for several prokaryotic systems for which due to their relative simplicity, a large number of quantitative measurements is available. Examples are the kinetic analysis of developmental pathway bifurcation in phage λ (2), computer simulation of the chemotactic signalling 3

pathway (4) and mechanistic modelling of the all or none effect in lactose operon regulation (10). The goal of this paper is to study the kinetic mechanisms that are responsible for the stochastic phenomena of gene expression. As both RNA and protein elongation rates are approximately equal for all genes, different levels of gene expression are the result of varying frequencies of transcription and translation initiations. Therefore, we examined the influence of transcription and translation initiation frequencies on the pattern of gene expression. The model of a prokariotic gene was built using kinetic parameters of LacZ determined by Kennell and Reizmann (11). The model is consistent with the experimental results presented in their work. We have systematically varied the frequencies of transcription and translation initiation reactions keeping other parameters of the model constant. Our results indicate that the genes expressed from strong promoters (high frequency of transcription initiation) produce proteins with constant velocity and low variation in number of molecules. Decrease in the frequency of transcription initiation causes a noisy pattern of gene expression i.e. the protein is produced in bursts at random time intervals. In contrast, the decrease in frequency of translation initiation lowers the speed of protein synthesis but does not lead to noisy expression patterns. FORMULATION OF THE MODEL We present the kinetic model of single prokariotic gene expression. The transcription and translation processes are represented by the collection of kinetic equations which are numerically integrated with the use of the Monte Carlo approach of Gillespie (12, 13). We assume that the cells in which the model gene is expressed are in the exponential growth phase. All of the experimental data cited in our work concern the cells in this growth phase. Moreover, under these conditions the concentrations of free RNA polymerase and ribosomes, 4

available for the single gene, are most probably kept constant by the delicate balance between many coupled reactions (14,15,16) such as synthesis of RNA polymerase and ribosomes, binding of RNA polymerase and ribosomes to other expressing genes, non-specific binding of RNA polymerase to DNA and its idling at pause sites. Detailed model of all the processes that buffer concentrations of free RNA polymerase and ribosomes would require taking into account the biosynthesis of all the components of gene expression machinery and their binding within all the transcription units in the cell. Most of the parameters of such an complex model would be unknown and impossible to estimate. Instead, we included into the model randomly changing pools of the free RNA polymerase and ribosomes. As these pools result from many small contributions of other processes their distributions should be Gaussian. The mean values of these distributions have been set close to the estimations presented by other authors. The fluctuations in the RNA polymerase and ribosome concentrations, described by the standard deviations of the distributions are unknown. Therefore, we check the sensitivity of our results to these parameters. Although many research groups are developing detailed kinetic models of various elements of the gene expression machinery (e g. transcription initiation (17)) the in vivo values of the rate constants of many elementary reactions remain unknown. In contrast many experimental works describe the kinetics of prokariotic gene expression (mainly LacZ gene in E. coli) in terms of simple effective parameters such as the frequencies of transcription and translation initiation, the rate of RNA polymerase and ribosome movement, mrna half life etc... (11, 18) In our model we adjust unknown values of elementary reactions to reproduce measured values of transcription and translation inititiation frequencies, mrna level and protein synthesis rate in vivo. We assume that effective parameters obtained in this way implicitly include reactions involving nucleotides, translation initiation factors, bivalent ions and other compounds necessary for proper function of gene expression machinery. The summary of the kinetic 5

model and its parameters for the LacZ gene is given in Table 1. Details of the model and simulation algorithm are described below. Model of transcription and its parameters Transcription initiation is represented in our model as the two-step process: P + RNAP -> P_RNAP (1) P_RNAP -> P + RNAP (2) P_RNAP -> TrRNAP (3) where P designates the promoter region of the gene, RNAP RNA polymerase and TrRNAP the transcribing RNA polymerase. Reactions 1 and 2 describe reversible RNA polymerase binding; reaction 3 represents isomerisation of closed binary complex to open binary complex. The second order rate constant of RNAP binding has been set to 10 8 M -1 s -1 which is the order of magnitude of RNAP DNA binding determined in in vitro assays (17). The values of closed complex to open complex isomerisation rates reported in literature are of the order of 1 s -1. The rate of reaction (3) was set to this value. If the rate constants of reactions 1 and 2 are set to the values listed above the rate of RNAP dissociation must be of the order of 10 s -1 to reproduce the transcription initiation frequency close to the value of 0.3 s -1 reported by Kennell and Reizmann (11). Moreover, binding constant resulting from the rates of reactions 1 and 2 is 10 7 M -1 which is in reasonable agreement with values estimated according to in vitro measurements (17). In order to clear the promoter region, active RNA polymerase must move 30 to 60 nucleotides (17). Taking into account that the rate of polymerase movement is about 40 nucleotides/s this step takes about 1 s (17). The length of the mrna chain which is synthesised during this time corresponds roughly to the length of the leader region containing ribosome binding site (RBS). Therefore, the synthesis of the RBS and promoter 6

clearance occur at approximately the same rate of 1 s -1. It is irrelevant for the conclusions of our model if the RBS appears slightly earlier or later than promoter clearance. Therefore, we modelled these two processes by the single first order reaction with the rate constant of 1 s -1 : TrRNAP -> RBS + P + ElRNAP (4) where ElRNAP denotes polymerase elongating a given mrna molecule. The elongation of the mrna is not modelled. We assume that the RNA polymerase completes synthesis of the mrna molecule with sufficient rate to allow for ribosome movement rate of 15 aminoacids/s. Translation, mrna decay and protein degradation. In prokaryotes transcription, translation and mrna degradation are tightly coupled. Following the experimental results of Yarchuk et al. (18) we have assumed the following interdependence between translation and mrna decay: i) RNAase E and ribosomes compete for the RBS. ii) If RNAase E binds to the RBS faster than the ribosome it degrades mrna in 5 to 3 direction and does not interfere with the movement of the ribosomes that have been already bound. iii) Every ribosome that successfully binds to the RBS completes translation of the protein. The above assumptions have been modelled by the following reactions: Ribosome + RBS -> RibRBS (5) RibRBS -> RBS + Ribosome (6) RibRBS -> ElRib + RBS (7) RBS -> decay (8) where Ribosome represents the pool of free ribosomes, RibRBS means ribosome binding site protected by the bound ribosome and ElRib denotes the ribosome elongating protein chain. 7

Reaction (8) models decay of unprotected RBS. Parameters of reactions 5-8 are more difficult to estimate than these concerning transcription initiation. Second order rate constant for ribosome binding was set to 10 8 M -1 s -1 which is of the order of diffusion limited macromolecule binding. The rate of RBS clearance step in translation initiation (reaction 7) was set to 0.5 s -1 which is close to experimental estimates (19). Parameters of reactions 6 and 8 have been adjusted to reproduce experimentally known rate of protein synthesis and ribosome spacing of LacZ gene. Another important constraint is provided by experiments of Yarchuk et al. (18). The authors introduced mutations in RBS region of LacZ gene that resulted in 200 fold variation in expression level. Decrease in protein synthesis rate has been followed by nearly equivalent change in stationary mrna level. This means that for the less effective ribosome binding sites the rate of mrna decay exceeded or at least was no lower than transcription initiation frequency. Otherwise the significant mrna level would be observed even in the case of RBS practically unprotected by ribosomes. We have set the rate of reaction 8 to 0.3 s -1 which is equal to transcription initiation frequency. Then, the rate of ribosome dissociation has to be set to 2.25 s -1 in order to reproduce the rate of protein synthesis measured by Kenell and Riezmann. The elongation of protein chain has been modelled as a single first order reaction: RibRBS -> protein (9) We have used the commonly accepted rate of the ribosome movement of 15 aa/s. Therefore the rate of reaction 7 has been set to 15/L s -1, where L denotes the length of protein chain. For the 1024 aa long product of the LacZ gene the value of this parameter is 0.015. This treatment of the elongation reaction is simplified with respect to other Monte Carlo models of prokaryotic gene expression (1, 20). In these publications addition of a single aminoacid by a single ribosome has been simulated as a separate reaction and ribosome overlap has been 8

prevented by excluded volume constraints. It will be shown in the further sections that in spite of neglecting fine details of ribosome movement and using average rate of protein chain synthesis instead, the model is able to reproduce the experimental data concerning expression of the LacZ gene. Simplification of the elongation reaction made the simulation computationally fast so we could collect good statistics from long timescale trajectories for a large number of parameter combinations. Protein degradation has been simulated by the following reaction: Protein->decay (10) Although protein degradation rates are variable, most of the native proteins have half-lives of the order of hours (21). In our simulations we have applied the value of 3 h that has been measured for wild type β-galactosidase in E. coli cells (22). RNA polymerase and ribosome pools. The numbers of available RNAP and ribosome molecules were generated from the gaussian distribution. In the case of RNAP the mean of the distribution was set to 35 molecules. This value is close to estimations made by other authors according to experimental data (14,15,17). As the number of ribosome molecules present in the cell is approximately order of magnitude larger then the number of RNAP molecules, the mean of the ribosome pool distribution has been set to 350. We are not able to estimate standard deviation of the above pools. We have repeated all the experiments with the standard deviations equal to 10% and 50% of the means in order to study the influence of the fluctuations in RNAP and ribosome pools on the gene expression pattern. In the following sections results will be presented for the standard deviations of the pools equal to 10% of the means unless it is stated explicitly that standard deviations is set to 50% of the means. 9

Simulation protocol. To compute the time evolution of the system described above we have used the Monte Carlo algorithm of Gillespie (12). This method has sound physical basis and has been already proved useful in the simulation of prokaryotic gene expression (1, 2). For the convenience of the reader a brief description of the Gillespie algorithm is given below. The rigorous derivation of this computational protocol can be found elsewhere (12, 13). Gillespie s algorithm is formulated to numerically study systems composed of a large number of chemical species whose interactions are described by rate equations. At each step of simulation two random numbers are generated. The first one is used to determine which reaction in the system will happen; the second one determines the waiting time for this event. The probability of a given reaction to be chosen is proportional to the product of its stochastic rate constant and numbers of substrate molecules. The distribution of the waiting times is given by the following equation: P(τ, µ) = a µ exp(- a 0 τ) (9) where a µ = h µ c µ ; a 0 = Σ a µ P(τ, µ) is the probability that the waiting time for the reaction is τ provided that reaction µ was chosen to happen; h µ is the product of the numbers of substrate molecules in reaction µ; c µ is the stochastic rate constant of this reaction. After the reaction and its waiting time are determined, one elementary reaction is executed (e. g. if the reaction is A+B->AB one A molecule and one B molecule will be removed from the system and one AB molecule will be added). Then the time of the simulation is increased by τ and next simulation step is executed. The stochastic rate constant of the reaction is defined as the probability that the elementary reaction will happen in the infinitesimally short time interval. It can be calculated from the 10

rate constant using simple relations. For the first order reactions both constants are equal. For the second order reaction the stochastic rate constant is equal to the rate constant divided by the volume of the system. It has been rigorously proven that for infinitely large number of molecules the Markov Chain generated by the Monte Carlo scheme given above converges to the same trajectory of the system as the deterministic integration of differential equations (12, 13). The advantage of the Gillespie algorithm over the differential equation approach is that it remains exact also for arbitrary low numbers of molecules in the system. This is extremely important in the simulation of biochemical systems where some processes involve single molecules (e.g. one molecule of promoter in transcription initiation). Randomly fluctuating pools of available ribosomes and RNA polymerase have been modelled in the following way. Before executing every step of Gillespie algorithm the number of RNAP molecules and ribosomes has been generated from gaussian distributions with the means and standard deviations defined in the previous section. Then the values of a µ (see eq 9) for every reaction were calculated and the step of Gillespie algorithm has been executed. Generation of the number of substrate molecules from the appropriate probability distribution has been already applied by Arkin et al. in Gillespie algorithm simulation to model rapid equilibrium phenomena in regulatory protein binding. We have applied Gillespie algorithm to the model described in the previous sections. At the beginning of the simulation only a single promoter (P) was present in the system. Numbers of all other molecules, except random pools of RNAP and ribosomes, were set to zero. Simulation was pursued until it reached the generation time of E. coli. Then cell division was simulated i.e. simulation was continued with the system containing one promoter molecule and all other numbers of molecules divided by 2. The generation time was set to 35 min. We have neglected the fact that during DNA replication two copies of the bacterial gene can be 11

expressed. Linear volume change of the growing cell has been assumed. Before every step of Gillespie algorithm stochastic rate constants of second order reactions have been recomputed by dividing appropriate rate constants by the current volume of the cell. Simulation of several generations is important in the simulation of constitutive gene expression. As shown below after starting our simulation with 0 protein molecules the level of gene expression equilibrates after few generations. We have found that 10 generations are sufficient to equilibrate the system. For each simulation we computed 100 independent trajectories of the system covering 10 generations each. We have used our own implementation of the Gillespie algorithm. The software has been tested against examples given in the original work (12). RESULTS. Validation of the model. To check the validity of the model we performed the simulation with the parameters describing the LacZ gene. Figure 1 A presents the gene expression pattern spanning ten bacterial generations. Standard deviation of RNAP and ribosome pools has been set to 3.5 and 35 respectively (10% of the mean values). To compare this data with the results of Kennell and Reizmann (11) we have analysed the changes of the number of protein molecules in the first generation since the experimental data covered the timescale of about 30 minutes after induction. The transcription initiation frequency obtained in this simulation was equal to 0.258 s -1, close to 0.3 s -1 reported by Kennell & Riezmann for LacZ gene. As seen in Figure 1 B, after about 500 seconds the number of protein molecules begins to grow linearly with time. The rate of protein synthesis determined as the slope of this part of the curve is 26 s -1 which is in good agreement with the value of 20 s -1 reported by Kennell and Reizmann (11). 12

The average number of mrna molecules present in the cell was determined from the data shown on Figure 1 C. After about 1000 seconds the number of mrna molecules reached stationary level. The mean number of molecules in this part of the curve is 56 +/- 4. This result is in good agreement with the value of 62 LacZ mrna molecules per cell measured by Kennell and Reizmann (11). Figure 1 D shows the number of ribosomes translating the LacZ mrna. The average number of molecules present after 1000 s is 1838 +/- 73. Therefore, according to our simulation one molecule of LacZ mrna is occupied on average by 32 ribosomes. This yields a ribosome spacing of 96 nucleotides (the length of Z message is 3074 bp according to GeneBank entry). This number is in agreement with the experimental value of 110 nt reported by Kennell and Reizmann (11). As will be shown in the next sections, when the rate of ribosome dissociation is increased in our simulation, the model predicts that the mrna level decreases. This behaviour is consistent with the experimental data of Yarchuk et al. (18). We have repeated the above simulation with the standard deviation of RNAP and ribosome pools set to 50% of the mean values. There are only minor differences with respect to the results reported above. The rate of protein synthesis is 24 s -1 ; mrna level 54 +/- 4 molecules; number of translating ribosomes 1756 +/- 43; ribosome spacing 95 nucleotides. We conclude that the results of our simulations are consistent with the experimental data concerning expression of the LacZ gene. In the next sections we will show predictions of the model for the genes with frequencies of transcription and translation initiations different from those of the LacZ gene. 13

Effect of varying transcription initiation frequency. Promoter efficiency can be decreased by two factors: low binding constant of RNA polymerase within promoter region or/and low rate of closed complex isomerisation. We have performed two sets of simulations in which transcription initiation frequency has been lowered in both ways. Low RNAP specificity was simulated by increasing RNAP dissociation rate. We assume that RNAP binding is limited by 3D diffusion in solution or by 1D diffusion along DNA molecule, so the promoter sequence influence only dissociation rate but not binding rate. The simulations have been performed for the following RNAP dissociation rates:10 s -1, 100 s -1, 1000 s -1, 10000 s -1. This resulted in transcription initiation frequencies 0.258 s -1, 0.052 s -1, 0.0054 s -1, 0.0005 s -1. These values cover the entire range of transcription initiation frequencies (1 s -1 10-4 s -1 ) observed so far in prokaryotic systems. All other parameters of the model and simulation protocol have been kept constant and set to the values used for the simulation of the LacZ gene. Figure 2 shows the results of the calculations. The plots display the values recorded in the 10th generation (time range: 18900 s 21000 s) when the expression pattern of the gene is equilibrated. In the case of strong promoters (plots A and B) all 100 trajectories shown are similar. In both cases the number of protein molecules grows linearly, with smaller differences between trajectories observed in the case of stronger promoter. The expression pattern in the case of the smallest frequency of transcription initiation analysed (plot D) is qualitatively different. The protein is produced in pulses rather than evenly, as in the cases described above. In some trajectories the number of protein molecules does not grow at all in the 10 th generation and the plots indicate only slow decay due to the protein degradation. In other cases, the gene is expressed for some time resulting in a sudden burst in the amount of protein. The time intervals between these events are random. The gene expression pattern for 14

the transcription initiation frequency 0.0054 s -1 (plot C) is the intermediate case with respect to the two extremes already described. In order to express the above observations quantitatively the following analysis was performed. For each 10 s time interval in the whole trajectory of the simulation (10 generations; time range 0-21000 s) the mean and standard deviation of the number of protein molecules was calculated. The variation coefficient, for each time interval, was then calculated as the ratio of the standard deviation and corresponding mean value. The values plotted as a function of time are presented on Figure 3A. The plot shows that the relative fluctuations of gene expression are much larger in the first generation of the cells than in subsequent ones. After approximately 2.5 generations the curves converge to the values 0.03, 0.08, 0.26, 0.84 for transcription initiation frequencies 0.258 s -1, 0.052 s -1, 0.0054 s -1, 0.0005 s -1 respectively. Therefore, in the case of weakly expressed genes the fluctuations in the number of protein molecules converge to about 84% of mean value, whereas for very strongly expressed genes these fluctuations do not exceed a few percent. In the next set of simulations promoter efficiency has been lowered by setting closed complex isomerisation rates (reaction 3) to 1 s -1, 0.1 s -1, 0.01 s -1, 0.001 s -1. Resulting transcription initiation frequencies were 0.258 s -1, 0.035, 0.0035, 0.0001 s -1. Gene expression patterns obtained in these simulations do not differ qualitatively from those presented on Figure 2. Figure 3B shows the plots of variation coefficient vs time. In order to examine the influence of the fluctuations in the pools of available RNAP and ribosomes we repeated all above experiments with the standard deviations of these pools set to 50% of mean values. Results do not differ qualitatively from the previous ones. Figure 3 C, D shows the plots of variation coefficients vs time for these calculations. 15

Effects of varying translation initiation frequency. We performed simulations in which translation initiation frequency was lowered by increasing ribosome dissociation. Similarly as in the case of RNA polymerase, we have assumed that ribosome binding is diffusion limited and changes in ribosome binding site affect dissociation rate rather than association rate. Simulations have been executed for the following values of ribosome dissociation rate: 2.25 s -1, 22.5 s -1, 225 s -1, 2250 s -1. The protein levels obtained in these calculations covered the 200-fold range observed by Yarchuk et al. (18) for RBS sites engineered to have various translation initiation rates. mrna levels recorded in the simulations are shown on Figure 4. The number of mrna molecules is lowered by the increase of ribosome dissociation rate. These results are in agreement with the experimental observation of Yarchuk et al. (18). Therefore, the results shown in Figure 4 strongly support our treatment of the translation initiation kinetics. The gene expression patterns obtained in the simulations described above are shown in Figure 5. The protein levels cover a similar range as in the simulations with varying frequency of transcription initatiation. The striking difference is, that even for the lowest rate of protein synthesis, the protein molecules are still produced evenly rather than in pulses. Variation coefficients calculated as described in the previous section is shown in Figure 6A. As one can see the shape of the curves is similar to those calculated for various transcription initiation frequencies but the final levels are significantly lower. Fluctuations in the number of protein molecules are higher for the lower rates of ribosome binding but they do not exceed 15%. We have repeated above calculations with the standard deviations of the number of available RNAP and ribosome molecules set to 50% of the means. Results do not differ qualitatively. Variation coefficients of the number of protein molecules are plotted on Figure 6B. The decrease of RBS effectiveness followed by the decrease of mrna level, as it has been observed by Yarchuk et al. (18), implies that the strength of ribosome binding site is governed 16

by ribosome binding/dissociation rather than by following RBS clearance reaction (reaction 7). If the ribosomes would be stalled at RBS they would protect mrna from RNAase E binding and the low protein level would not be followed by the decrease in the mrna level. In spite of this fact we have performed simulations in which frequency of translation initiation has been lowered by decreasing the rate of reaction 7 (data not shown). Results do not significantly differ from the ones shown on figures 4, 5, 6. DISSCUSION The kinetic model of gene expression presented above was tested against the experimental data concerning the LacZ gene in Escherichia coli. The rate of protein synthesis, mrna level and ribosome spacing were in reasonable agreement with the experimental observations. Moreover, the changes of mrna level due to the variation in ribosome binding rate were correctly reproduced. The model applied to study the influence of transcription and translation initiation frequencies on the gene expression pattern shows significant, qualitative difference between these two cases. When the level of gene expression is lowered by decreasing promoter strength, the decrease in the number of protein molecules is accompanied by large fluctuations reaching 100% of the mean value. In contrast, low expression of the gene, without large fluctuations in the number of protein molecules, can be maintained by decreasing the efficiency of the ribosome binding site i.e. lowering the rate of ribosome binding. For protein levels comparable to those corresponding to the smallest rate of transcription initiation frequency analysed, the fluctuations do not exceed 15%. According to our simulations the model is not sensitive to the magnitude of the fluctuations of the pools of available RNA polymerase and ribosomes. The simulations repeated with variation coefficients of this pools equal to 10% and 50% resulted in good agreement with experimental data. The main conclusions of our work, concerning the fluctuations in gene 17

expression patterns are also unaffected by the variation coefficients of RNAP and ribosome pools. The influence of the variation coefficients of the pools is limited to minor change in exact values of standard deviations of the number of protein molecules. According to our model, the qualitative difference between the expression patterns of genes with low frequency of transcription initiation and low frequency of translation initiation is caused by the following mechanism. The transcription initiation event is always controlled by single promoter molecule, whereas in the case of translation initiation many mrna molecules recruiting ribosomes or RNAases are involved. Therefore, in this case fluctuations are lowered by averaging over the number of ribosome binding sites taking part in the reaction. Although variable frequencies of the translation initiation have been observed in bacterial cells, gene expression is more commonly regulated on the promoter level. This seems to contradict the results of our model which suggest that large fluctuations in the number of protein molecules result from maintenance of low expression levels by low frequency of transcription initiation. These fluctuations are especially important in the context of the synthesis of regulatory proteins. Regulatory proteins are expressed in low amounts and random changes in their concentration could affect regulated genes in an unpredictable way. This rises the question, why the low frequency of transcription initiation is more frequent mechanism of maintaining low expression these proteins. To answer this question one needs to consider, at least semi-quantitatively, the kinetics of protein-dna binding. Kinetic studies of the Lac repressor binding to DNA revealed that this protein finds its target operator site with the very high rate of about 7x10 9 M -1 s -1 (23). To appreciate the consequences of the high value of this second order rate constant one can express it as the rate with which a single repressor molecule binds its operator site in the volume of the cell (10-15 L). The result, 12 s -1, is an order of magnitude higher than the largest transcription initiation rates observed. This means that binding of even a single repressor molecule to its operator site would be still faster 18

than the rate of RNA polymerase recruitment and activation. Thus, should the fluctuations in the number of protein molecules be comparable to these presented on figure 2 D, the number of repressor molecules present in most cells would be sufficient for rapid binding of the target gene, and the fluctuations would not influence the gene regulation process. The situation can be more complex if the gene is influenced by several regulatory proteins as happens in the complex life cycles of bacteriophages. In this case the random changes in relative amounts of these proteins can trigger bifurcations in developmental pathways as shown by Arkin et al. (2). We conclude, that the low level of proteins regulating bacterial operons can be maintained by low transcription initiation frequency because regulatory proteins bind to DNA sites with very high efficiency exceeding the diffusion limit. In contrast, keeping the low expression level by inefficient ribosome binding is not optimal from the energetic point of view since mrna molecules have to be unnecessarily synthesised. Therefore, expression of most regulatory proteins from weak promoters is an evolutionarily preferred strategy in spite of the large random fluctuations in the number of protein molecules. It is also worth commenting on the difference between constitutive and induced gene expression as implied by our model. The curves in Figures 3 and 6 indicate that the relative magnitude of random variation is significantly higher in the bacterial generation in which the gene has been induced than in the subsequent generations. Therefore, according to the model, constitutively expressed genes show lower level of fluctuations than induced genes. On the other hand, many induced genes that code for enzymes are expressed to very high levels so that the fluctuations in the number of protein molecules quickly fall to very low values (see Figure 3 A). We have presented a kinetic model of gene expression which is able to reproduce a variety of experimental observations concerning the model prokariotic gene, LacZ. We believe that 19

the results obtained by the application of the model provide insights into the origins of the stochastic patterns of prokariotic gene expression. These effects may be of interest both in efforts aimed at understanding the molecular details of gene regulation and in the biotechnology where optimisation of protein overexpression in bacteria is one of the primary goals. ACKNOWLEDGEMENTS We are grateful to prof. A.L. Haenni for critical comments on the manuscript. This work was supported by internal PW5 grant from IBB PAS. 20

REFERENCES 1. McAdams, H.H., Arkin, A. (1997) Proc.Natl.Acad.Sci.USA 94, 814-819 2. Arkin, A., Ross, J., McAdams, H.H. (1998) Genetics 149, 1633-1648 3. Newlands, S., Levitt, L.K., Robinson, C.S., Karpf, C.A.B., Hodgson, V.R.M., Wade, R.P., Hardeman, E.C. (1998) Genes & Development 12, 2748-2758 4. Levin, M.D., Morton-Firth, C.J., Abouhamad, W.N., Bourret, R.B., Bray, D. (1998) Biophys. J. 74, 175-81 5. Tyson, J.J., Hannsgen, K.B. (1985) J. Theor. Biol. 113, 29-62 6. Maloney, P.C., Rotman, B. (1973) J.Mol.Biol. 73, 77-91 7. Siegele, D.A., Hu, J.C. (1997) Proc.Natl.Acad.Sci.USA 94, 8168-8172 8. Elowitz, M.B., Leibler, S. (2000) Nature 403, 335-338 9. Alberts, B., Bray, D., Lewis, J., Raft, M., Roberts, K., Watson J.D. (1994) Molecular Biology of the Cell 3rd Ed, Garland Publishing, London, NY 10. Carrier, T.A., Keasling, J.D. (1999) J.Theor.Biol 201, 25-36 11. Kennell, D., Riezman, H. (1977) J.Mol.Biol. 114, 1-21 12. Gillespie, D.T. (1977) J.Phys.Chem. 81, 2340-2361 13. Gillespie, D.T. (1992) Phys. A 188, 404-425 14. Bremer, H., Dennis P.P. (1996) Escherichia coli and Salmonella 2nd Ed, ASM press pp. 1553-1569. 15. McClure, W.R., (1985) Annu. Rev. Biochem. 1985. 54, 171-204 16. Neidhardt, F.C. (1999) J. Bacteriology 181, 7405-7408 17. Record, T.M., Reznikoff, W.S., Craig, M.L., McQuade, K.L., Schlax, P.J. (1996) Escherichia coli and Salmonella 2nd Ed, ASM press pp. 792-821 18. Yarchuk, O., Jacques, N., Guillerez, J., Dreyfus, M. (1992) J.Mol.Biol. 226, 581-596 19. Draper, D.E. (1996) Escherichia coli and Salmonella 2nd Ed, ASM press pp. 902-908 20. Carrier, A.T., Keasling, J.D. (1997) J.Theor.Biol 189, 195-209 21. Goldberg, A.L., Dice, J.F. (1974) Annu.Rev.Biochem. 43, 835-869 22. Berquist, P.L., Truman, P. (1978) Molec.Gen.Genet. 164, 105-108 23. Lin, S., Riggs, A.D. (1975) Cell 4, 107-111 21

Table 1 Summary of the kinetic model and its parameters for LacZ gene. Reaction/process Rate constants comments The pool of available RNA polymerase. The pool of available ribosomes. RNA polymerase binding and dissociation. Closed complex isomerisation 1 s -1 Promoter clearance and RBS synthesis. Ribosome binding and dissociation Gaussian distribution with µ=35 and δ=3.5 molecules. Gaussian distribution with µ=350 and δ=35 molecules. Association rate: 10 8 M -1 s -1 Dissociation rate: 10 s -1 Order of magnitude of the mean is set according to experimental estimations (17, 14). Sensitivity of the results to the value of δ has been tested. Association rate set according to experimental estimations (17). Dissociation rate is set to reproduce transcription initiation frequency measured by Kennell & Riezmann (11). According to experimental estimations (17). 1 s -1 Estimated using the rate of polymerase movement and the lengths of RBS and promoter sequences. Association rate: 10 8 M -1 s -1 Dissociation rate: 2.25 s -1 Association rate set to the order of magnitude of diffusion limited aggregation. Dissociation rate set to reproduce proper translation initiation frequency. RBS clearance 0.5 s -1 According to experimental estimations (19). mrna degradation 0.3 s -1 Set close to transcription initiation frequency. The values at least as high as transcription initiation frequency are necessary to assure that mrna level is lowered by the decrease in protection of RBS by bound ribosomes (as observed by Yarchuk et al. (18)). Elongation of protein chain 0.015 s -1 According to commonly accepted value of ribosome movement rate - 15 aa/s and the length of β galactosidase. Protein degradation 6.42 x 10-5 According of β galactosidase half-life of 3 h measured by Bergquist and Truman (22). 22

A) Number of protein molecules as a function of time. The results for 100 trajectories recorded over 10 bacterial generations are shown (generation time 2100 s). After every generation the number of protein and mrna molecules was divided by 2. B) Number of protein molecules recorded in the first generation. In every 10 s time interval the number of protein molecules was averaged over 100 trajectories and the standard deviation computed. Data on the plot are mean (average trajectory) and +/- 3 s values for each time interval. Linear function is fitted to all data points on the average trajectory recorded for times exceeding 500s. C) Average and +/- 3 s trajectories for the number of mrna molecules in the first generation. D) Average and +/- 3 s trajectories for the number of ribosomes moving on LacZ mrna. Figure 1. Simulation of LacZ gene expression.

Figure 2. Expression patterns for various transcription initiation frequencies. On each plot 100 trajectories recorded in 10 th generation are plotted with red lines. Average and +/- 3 σ trajectories are plotted with black lines. On plot D 3 σ trajectory is not shown as it would have negative numbers of molecules. Plots A, B, C and D present results for transcription initiation frequencies 0.258 s -1, 0.052 s -1, 0.0054 s -1 and 0.0005 s -1 respectively.

Figure 3. Variation coefficients of the number of protein molecules calculated for different transcription initiation frequencies. For every simulation the average number of protein molecules and its standard deviation in 100 trajectories were computed in each 10 s time interval. Data on plots are stdandard deviation/ mean ratios for every time interval. Plots span the time scale of 10 generations. A) Transcription initiation frequency (TIF) lowered by increasing RNAP dissociation. Curves I, II, III, IV correspond to TIF 0.258 s -1, 0.052 s -1, 0.0054 s -1, 0.0005 s -1 respectively. B) TIF lowered by increasing closed complex isomerisation rate. Curves I,II,III,IV correspond to TIF 0.258 s -1, 0.035 s -1, 0.0035 s -1, 0.0001 s -1. Plots C, D show results for the simulations in which std. dev. of RNAP and ribosome pools have been changed from 10% to 50% of the mean values.c) TIF changed by RNAP dissociation rate. Curves I,II,III,IV correspond to TIF values 0.259 s-1,0.051 s -1, 0.0051 s -1, 0.0003 s -1. D) TIF changed by closed complex isomerisation rate. Curves I,II,III,IV correspond to TIF values 0.259 s -1, 0.034 s -1, 0.0034 s -1, 0.0001 s -1. Numbers in parentheses show the values to which the curves converge.

I II Figure 4. Number of mrna molecules calculated for different ribosome dissociation rates. A) I - ribosome dissociation rate 2.25 s -1, II - ribosome dissociation rate 22.5 s -1. B) ribosome dissociation rate 2250 s -1. The results for ribosome dissociation rate 225 s -1 are not shown as the curve would overlap with the curve shown on plot B

Figure 5. Expression patterns for various ribosome dissociation rates. On each plot 100 trajectories recorded in 10 th generation are plotted with red lines. Average +/- 3 σ trajectories are plotted with black lines. A) I ribosome dissociation rate 2.25 s -1, II ribosome dissociation rate 22.5 s -1. B) I ribosome dissociation rate 225 s -1, II ribosome dissociation rate 2250 s -1.

Figure 6. Variation coefficients of the number of protein molecules calculated for various ribosome dissociation rates. For every simulation the average number of protein molecules and its standard deviation in 100 trajectories was computed in each 10 s time interval. Data on plot are standard deviation/ mean ratios for every time interval. Plots span the time scale of 10 generations. Curves I, II, III, IV show results for ribosome dissociation rates 2.25 s -1, 22.5 s -1, 225 s -1, 2250 s -1. Numbers in parentheses denote values to which the curves converge. A) Results for the experiments in which the standard deviations of RNAP and ribosome pools were set to 10% of the mean value. B) Results for the experiments in which the standard deviations of RNAP and ribosome pools were set to 50% of mean value.

The effect of transcription and translation initiation frequencies on the stochastic fluctuations in prokaryotic gene expression Andrzej M Kierzek, Jolanta Zaim and Piotr Zielenkiewicz J. Biol. Chem. published online November 2, 2000 Access the most updated version of this article at doi: 10.1074/jbc.M006264200 Alerts: When this article is cited When a correction for this article is posted Click here to choose from all of JBC's e-mail alerts