Lecture 7: Simple genetic circuits I

Lecture 7: Simple genetic circuits I Paul C Bressloff (Fall 2018) 7.1 Transcription and translation In Fig. 20 we show the two main stages in the expression of a single gene according to the central dogma. 1. Transcription (DNA RNA). The first major stage of gene expression is the synthesis of a messenger RNA molecule (mrna) with a nucleotide sequence complementary to the DNA strand from which it is copied - this serves as the template for protein synthesis. Transcription is mediated by a molecular machine known as RNA polymerase. The key steps in transcription are binding of RNA polymerase (P ) to the relevant promoter region of DNA (D) to form a closed complex (P D c ), the unzipping of the two strands of DNA to form an open complex (P D o ), and finally promoter escape, when RNA polymerase reads one of the exposed strands: P + D k + k open k escape k P D c P D 0 transcription. Once the RNAP is reading the strand, the promoter is unoccupied and ready to accept a new polymerase. The binding/unbinding of polymerase is very fast, k ± k open so that the first step happens many times before formation of an open complex. Hence, one can treat Figure 20: Schematic diagram of a eukaryotic cell showing the main steps of protein synthesis according to the central dogma. See text for details.

the RNA polymerase as in quasi-equilibrium with the promoter and the rate of transcription will thus be proportional to the fraction of bound RNA polymerase. 2. Translation (RNA protein). The second major stage is synthesis of a protein from mrna. In the case of eukaryotes, transcription takes place in the cell nucleus, whereas subsequent protein synthesis takes place in the cytoplasm, which means that the mrna has to be exported from the nucleus as an intermediate step. Translation is mediated by a macromolecule known as a ribosome, which produces a string of amino acids (polypeptide chains), each specified by a codon (represented by three letters) on the mrna molecule. Since there are four nucleotides (A, U, C, G), there are 64 distinct codons, e.g., AUG, CGG, most of which code for a single amino acid. The process of translation consists of ribosomes moving along the mrna without backtracking (from one end to the other, technically known as the 5 end to the 3 end) and is conceptually divided into three major stages (as is transcription): initiation, elongation and termination. Each elongation step involves translating or reading of a codon and the binding of a freely diffusing transfer RNA (trna) molecule that carries the specific amino acid corresponding to that codon. Once the chain of amino acids has been generated a number of further processes occur in order to generate a correctly folded protein. 7.2 Simple model of protein synthesis A very simple model of (unregulated) gene expression is shown in Fig. 21. Let a and x denote the concentrations of mrna and protein, respectively, due to the activity of a single gene. The kinetic equations take the simple form da κ γa, κ pa γ p x, (7.1) where κ is the transcription rate of mrna by the gene, κ p is the translation rate of protein by each mrna molecule, and γ, γ p are degradation rates. At steady state, the protein concentration is x κ p γ p a κ pκ γ p γ. protein γ p RNA polymerase mrna κp γ degradation κ DNA promoter gene x Figure 21: Unregulated transcription of a gene x following binding of RNA polymerase to the promoter region. The resulting mrna exits the nucleus and is then translated by ribosomes to form protein. Page 42

Molecular/intrinsic noise. The production of mrna from a typical gene in E. coli occurs at a rate around 10nM per minute, while the average lifetime of mrna due to degradation is around a minute. This implies that on average a 1 10nM. The translation rate is a few proteins per minute and the protein lifetime tends to be at least several hours. Hence x 100 1000nM. Given that the cell volume of E. coli is around 10 15 L, it follows that 1nM corresponds to one molecule per cell. This implies that the steady-state levels of mrna and protein are typically 10 4 molecules per cell! Deterministic chemical kinetic equations based on the law of mass action assume that the concentrations of the various reactants vary continuously. For molar concentrations with molecule numbers of order 10 23 this is a reasonable approximation. On the other hand, the number of mrna and protein molecules within a cell are much smaller, and one has to take into account the discrete nature of chemical reactions (molecular noise). This form of noise is inherent to a given system of interest and is thus a form of intrinsic noise, rather than arising from external factors (extrinsic noise). 7.3 Gene regulation The above picture ignores a major feature of cellular processing, namely, gene regulation. Individual cells frequently have to make decisions, that is, to express different genes at different spatial locations and times, and at different activity levels. a Y protein Y RNAp activator Y* DNA promoter mrna gene x b Y Y repressor DNA Y* gene x Figure 22: Transcriptional regulation due to the binding of a repressor or activator protein to a promoter region along the DNA. (a) Increased transcription due to the binding of an activator protein to the promoter. An activator typically transitions between inactive and active forms; the active form has a high affinity to the promoter binding site. An external chemical signal can regulate transitions between the active and inactive states. (c) Transcription can be stopped (or reduced) by a repressor protein binding to the promoter and blocking the binding of RNA polymerase. One of the most important mechanisms of genetic control is transcriptional regulation, that is, Page 43

determining whether or not an mrna molecule is made. The control of transcription (switching a gene on or off ) is mediated by proteins known as transcription factors, see Fig. 22. Negative control (or repression) is mediated by repressors that bind to a promotor region along the DNA where RNA polymerase has to bind in order to initiate transcription - it thus inhibits transcription. On the other hand, positive control (activation) is mediated by activators that increase the probability of RNA polymerase binding to the promoter. Note 5. The presence of transcription factors means that cellular processes can be controlled by extremely complex gene networks, in which the expression of one gene produces a repressor or activator, which then regulates the expression of the same gene or another gene. This can result in many negative and positive feedback loops, the understanding of which lies at the heart of systems biology. In addition to transcriptional regulation, there are a variety of other mechanisms that can control gene expression including mrna and protein degradation, and translational regulation. 7.4 Autoregulatory network One of the simplest examples of a gene network with regulatory feedback is autoregulation, in which a gene is directly regulated by its own gene product, see Fig. 23. A simple kinetic model of autoregulatory feedback is da κg(x) γa, κ pa γ p x, (7.2) The function g(x) represents the nonlinear feedback effect of the protein on the transcription of mrna. A typical choice for g in the case of an activator (+) or repressor (-) is to express it in terms of a Hill function, see Fig. 23. That is, g g or g g + with x 2 g + (x) 1 + σ K + x 2, g (x) K + σ, (7.3) K + x2 with σ determining the rate of production in the limit x. One can view the quadratic terms as arising from dimerization with K a dissociation constant (inverse of an equilibrium constant). We can further simplify the dynamics by assuming that the mrna is in quasi-equilibrium, m κg(x)/γ so that, the protein concentration evolves according to γ px + κ pκ g(x) f(x). (7.4) γ A fixed point x of the kinetic equation (7.4) is given by a solution of the algebraic equation f(x ) 0. One can thus determine the number of fixed points and their stability using a graphical construction, see Fig. 24. In the case of negative feedback there is a single stable fixed point, whereas with positive feedback the network can be monostable or bistable. In the latter case, the autoregulatory network can act as a genetic switch. Page 44

(a) log g -- (x) 1 gene x log g + (x) 1+σ (b) σ log x 1 log x Figure 23: (a) An autoregulatory network. A gene x is repressed (or activated) by its own protein product. If the feedback connector ends in a bar then this indicates repression, whereas if it ends in an arrow then this indicates activation. (b) Promoter activity functions g (x) and g + (x) for repression and activation, respectively, as a function of protein concentration x. Linear stability of fixed points. Consider the general dynamical equation f(x), and suppose there is a fixed point x for which f(x ) 0. Imagine that one is in a neighborhood of the fixed point by writing x(t) x + x(t). Substituting into the differential equation and Taylor expanding to first order, we have d(x + x(t)) f(x ) + f (x ) x(t) + O( x 2 ), f (x) df, where O( x 2 ) means terms involving quadratic or higher powers of x. Since x is a constant and f(x ) 0, we obtain the linear equation d x f (x ) x x(t) x(0)e f (x )t. Hence, if f (x ) < 0 then x(t) 0 as t and the fixed point is stable. If f (x ) > 0 then it is unstable. A similar analysis can be carried out for the higher dimensional equation Now we have d x j j f j(x), x (x 1,..., x d ). (7.5) d M ij x j, j1 M ij f i x j xx. Page 45

0.2 0.15 f(x) 0.1 0.05 0 +ve feedback -ve feedback -0.05-0.1 0 1 2 3 protein concentration x Figure 24: Fixed points of deterministic autoregulatory network. Network is monostable in the case of negative feedback (red curve) and weak positive feedback (dashed blue curve), but can exhibit bistability in the case of strong positive feedback (solid blue curve). The fixed point is stable if the real part of each eigenvalue of the Jacobian matrix M is negative. The d eigenvalues are determined from the eigenvalue equation d M ij v j λv j, j1 where v j is the jth component of the corresponding eigenvector. In matrix form, we have [M λi]v 0, where I is the identity matrix. Given that v is a non-zero vector, it follows that the matrix [M λi] is not invertible. The condition of non-invertibility takes the form of the so-called characteristic equation Det[M λi] 0. (7.6) This yields a d th order polynomial in λ, which has d (possibly complex-valued) roots. 7.5 Promoter binding/unbinding The form of the nonlinear feedback function g(x) depends on the nature of the promoter. Single operator site. First, suppose that the promoter of a gene has a single operator site for binding protein, see Fig. 25 for the autoregulatory network. The gene is assumed to be ON when is bound to the promoter and OFF otherwise. The corresponding reaction scheme is OR 1 + k + k OR 1 (7.7) Page 46

where OR 1 denotes the unbound operator site and OR 1 denotes the bound site. Let O 0 and O 1 denote the corresponding unbound and bound promoter states, so that O 0 + k + k O 1. (7.8) If the number of proteins is sufficiently large, then we have the reduced reaction scheme O 0 k + [] k O 1, where [] is the concentration of. (a) (b) κ p OR 1 OR 1 OFF Figure 25: Two-state promoter binding/unbinding model. (a) Autoregulatory feedback circuit. (b) Promoter states. Let M(t) label the promoter state at time t, with M(t) m if the state is currently O m. Introduce the probability distribution P m (t) P[M(t) m]. From conservation of probability, P 0 (t)+p 1 (t) 1. The transition rates then determine the probability of jumping from one state to the other in a small interval t: P [M(t + t) 1 M(t) 0] k + [] t, P [M(t + t) 0 M(t) 1] k t. It follows that there are two possible ways for the promoter to be in the state O 0 at time t + t: either it remains in the same state or jumps from the other state. That is P 0 (t + t) P[O 0 O 0 ]P 0 (t) + P[O 1 O 0 ]P 1 (t) (1 k + [] t)p 0 (t) + k tp 1 (t). Writing down a similar equation for the open state, dividing by t, and taking the limit t 0 leads to the pair of equations dp 0 k + []P 0 + k P 1 dp 1 k + []P 0 k P 1, (7.9) which are equivalent, since P 0 (t) + P 1 (t) 1. Equation (7.9) has the unique stable steady-state P 0 K [] + K, P 1 [] [] + K, K k k +. (7.10) Page 47

Here K is a dissociation constant (inverse of the equilibrium constant). Assuming that the gene only produces in the unbound state, the kinetic equation for x [] is κ pp 0 γ p x κ pk x + K κ px. (7.11) Similarly, if acts as its own activator, then the gene only produces protein in the bound state, and κ pp1 γ p x κ px x + K κ px. (7.12) Dimerization. Now suppose that the operator site only binds dimers (cooperative) binding. The corresponding reaction schemes are now + k 1 k 1 2, OR 1 + 2 k + k OR 1. (7.13) We assume that dimerization is the fastest reaction, so that the concentration of dimers is in quasi-equilibrium, that is, from the equilibrium law of mass action, where K 1 is the dimerization constant. [ 2 ] k 1 k 1 [] 2 K 1 x 2, The reduced reaction scheme for the two promoter states is now O 0 k + [] 2 k O 1, where we have absorbed K 1 into k +, and the corresponding master equation for the probabilities P m (t) are It follows that dp 0 k + x 2 P 0 + k P 1, P 0 dp 1 k + x 2 k P 1. (7.14) K x 2 + K, P 1 x2 x 2 + K. (7.15) Multiple operator sites. Now consider a promoter with two non-overlapping operator sites OR 1 and OR 2, each of which can independently bind a protein. The two independent binding reactions reactions are OR 1 + k 1 OR1, OR 2 + k 2 OR2. (7.16) k 1 k 2 We have four promoter states O m, m 0, 1, 2, 3 as follows: O 0 : OR 1 and OR 2 unoccupied O 1 : OR 1 occupied by, OR 2 unoccupied O 2 : OR 1 unoccupied, OR 2 occupied by O 3 : OR 1 and OR 2 each occupied by Page 48

(a) (b) κ p ακ p ακ p OR 1 OR 2 OR 1 OR 2 OFF Figure 26: Multiple operator sites. (a) Autoregulatory feedback circuit. (b) Promoter states. The reaction scheme for the promoter states is k 1 [] k 2 [] k 2 [] k 1 [] O 0 O 1, O 0 O 2, O 1 O 2, O 2 O 3, (7.17) k 1 k 2 k 2 k 1 The corresponding steady-state probability distributions for the different promoter states are then with dissociation constants P0 1 1 + []/K 1 + []/K 2 + [] 2 /K 1 K 2 P1 []/K 1 1 + []/K 1 + []/K 2 + [] 2 /K 1 K 2 P2 []/K 2 1 + []/K 1 + []/K 2 + [] 2 /K 1 K 2 P3 [] 2 /K 1 K 2 1 + []/K 1 + []/K 2 + [] 2, /K 1 K 2 K 1 k 1 k 1, K 2 k 2 k 2. Now suppose that promoter state O 0 produces protein at a rate κ p, states O 1, O 2 produce at the reduced rate ακ p, 0 < α < 1, and state O 3 is off. It follows that the protein concentration evolves according to the kinetic equation κ pp 0 + ακ p (P 1 + P 2 ) γ p x. (7.18) Promoter noise. In order to derive an explicit expression for nonlinear feedback terms in gene regulatory networks, one assumes that transcription factor/dna binding is faster than other processes, such as the rate of synthesis and degradation (adiabatic assumption). One can then use the steady-state probabilities for the different promoter states in the kinetic equations for protein production and degradation. If the adiabatic assumption breaks down, then one has another source of intrinsic noise due to the stochastic nature of transcription factor binding (promoter noise). Page 49