Carbon labeling for Metabolic Flux Analysis Nicholas Wayne Henderson Computational and Applied Mathematics Abstract Metabolic engineering is rapidly growing as a discipline. Applications in pharmaceuticals, energy, and materials have the potential for a large commercial impact. This paper covers the basics of a mathematical toolset that uses carbon labeling information to assist in metabolic flux analysis. The purpose is to introduce users to the inner workings of a software tool developed in Matlab. It is hoped that this work makes the process and mathematics of Metabolic Flux Analysis easier to understand and implement.
Background Through studies in molecular biology and genetics, researchers have amassed significant knowledge of cellular metabolism and the controlling genetic regulation systems. Metabolic engineers strive to use this knowledge to manipulate metabolic reaction networks in order to achieve some objective. The text Metabolic Engineering (Stephanopoulos 1998) presents the following definition: metabolic engineering [is] the directed improvement of product formation or cellular properties through the modification of specific biochemical reactions or the introduction of new ones with the use of recombinant DNA technology. The key work to be carried out in metabolic engineering is selecting the reactions for modification. Once the reactions are targeted, established recombinant techniques are used to inhibit or amplify the genes corresponding to the necessary enzymes. Metabolic engineers use Metabolic Flux Analysis (MFA) in order to target the specific reactions. The flux of a metabolic reaction is the rate of change from input metabolites to output metabolites. As an example, one might write the following reaction: A + B" v C A,B,C # metabolites v # flux The flux, v, measures the rate at which metabolites A and B are consumed and C is produced. The purpose of MFA is to learn the fluxes of as many reactions in a metabolic pathway as possible. This knowledge gives the engineer an idea of which reactions are more active in a pathway. The engineer then has the ability to better predict the outcome of manipulations and can chose ones that help achieve his objective. Uncovering the intracellular fluxes is a complicated process. We want to know the flux of reactions deep inside a pathway, where the pools of intermediate metabolites are too small to measure. Basic stoichiometry, combined with knowledge of extracellular conditions, can be used to build a system of equations in order to solve for the flux values. The catch is that these systems are often underdetermined because there are more fluxes than metabolites in a pathway. Methods to solve this problem involve the use of 13 C labeling and knowledge of carbon skeleton rearrangements [1]. If an engineer knows the labeling patterns of input metabolites and is able to observe the labeling patterns of products then he is able to determine intracellular fluxes with greater certainty. The Metabolic Engineering research group in the Computational and Applied Mathematics department is currently interested in ways that 13 C labeling experiments can augment the underdetermined stoichiometric system in order to fully solve for the intracellular fluxes. 2
Mathematics of Metabolic Flux Analysis The literature on carbon labeling experiments for metabolic flux analysis is quite complex. This stems from a confusing notation that must handle various fluxes, metabolites, and other constraints. This section will attempt to convey the mathematics behind this project in order to educate users to the inner workings of the Matlab toolset. Care will be taken to keep the notation simple. Readers are encouraged to explore the literature for analysis with greater depth. Example System This section will use an example from Wiechert s paper, titled Bidirectional Reaction Steps in Metabolic Networks: III. Explicit Solution and Analysis of Isotopomer Labeling System. It is a vastly simplified version of the citric acid cycle: A f " 1 # B f B 2,b $ " 2 # E f B + E 3,b $ " 3 # C E f " 4 # H C f 6 "# D + F D f 7 "# E + G Figure 1: The example network. Reactions 2 and 3 are bidirectional. Figure 2: The reaction list corresponding to the example metabolic network in Figure 1. Flux variables are denoted f for forward and b for backward. Here A is the input metabolite. B, C, D, and E are intracellular metabolites. K, F, and H are output metabolites. Experimenters are typically able to measure only the fluxes corresponding to input and output metabolites. Thus, we must use modeling to gain information regarding the intracellular metabolites. It should be noted that the reaction fluxes are simply the rate at which educts are consumed and products are formed. Flux values should not be considered constants in the sense of reaction kinetics. Yet, metabolic systems with constant inputs will lead to constant flux values. If certain components of the system are modified, such as input metabolite or enzyme concentrations, the flux values will change. Stationarity of Intracellular Metabolites The key assumption in metabolic flux analysis is that the concentrations of intracellular metabolites are in a steady state. Thus, build up or depletion of these 3
metabolites does not occur with in a cell. They are created as fast as they are consumed. This is called a dynamic equilibrium and can be expressed by the following equations: d[b], d[c], d[d], d[e] [1] This allows the modeling of the system in terms of stoichiometric balance equations. Stoichiometry Each reaction is associated with a flux. Bidirectional reactions have 2 fluxes: one for the forward direction and one for the reverse direction. The flux describes the rate at which educts are turned into products. Referring back to the example, metabolite B is created at a rate of f 1 + b 2 + b 3 and consumed at a rate of f 2 + f 3 + f 5. The following balance equations results when considering all intracellular metabolites: d[b] = f 1 " f 2 + b 2 " f 3 + b 3 " f 5 d[c] = f 3 " b 3 " f 6 d[d] = f 6 " f 7 d[e] = f 2 " b 2 " f 3 + b 3 " f 4 + f 7 [2] Which can be written as a linear system: or, # 1 "1 1 "1 1 0 "1 0 0& # 0& % ( % ( % 0 0 0 1 "1 0 0 "1 0 ( v = % 0 ( % 0 0 0 0 0 0 0 1 "1( % 0( % ( % ( $ 0 1 "1 "1 1 "1 0 0 1' $ 0' Sv [3] where v = [ f 1, f 2,b 2, f 3,b 3, f 4, f 5, f 6, f 7 ] " is the vector of fluxes. This system is underdetermined, which is a common feature of realistic networks. There are typically many more fluxes than metabolites. The overall goal is to uncover the flux vector v. The stoichiometric balance system imposes constraints on v, but cannot be used to obtain a unique solution. We plan on using carbon labeling information to add more constraints to this system. 4
It is possible to measure some extracellular fluxes, those that enter or exit the system with no other involvement. In the example, let s say fluxes f 1 and f 5 are directly measurable. This information can be used to write a flux measurement system: or, " 1 0 0 0 0 0 0 0 0% " $ ' v = w % 1 $ ' # 0 0 0 0 0 0 1 0 0& # & w 2 Hv = w. [4] The combination of the stoichiometric [eq. 3] and flux [eq. 4] measurement systems reduces the dimensionality of the flux identification problem. First, an optimization method is used to find an initial solution (v 0 ) to the underdetermined system: " S% $ # H& ' v = " 0 % $ # w ' & that forces all of the fluxes to be positive (v 0 > 0). Second, Gaussian elimination or the singular value decomposition is used to find the null space of the augmented matrix ([S;H]). The result is the expression of all fluxes in terms of a few free fluxes: v = Nx + v 0 [5] where v is the desired vector of fluxes, the columns of N span the null space, x is the vector of free fluxes, and v 0 is the initial solution. In the example there are 9 fluxes in total, but only 3 free fluxes. Because of stoichiometry and flux measurement, only 3 flux values need to be uncovered in order to solve the entire system. In order to move forward, we must use information gained from carbon labeling experiments. Isotopomers Carbons atoms are naturally found in 3 forms: 12 C with 6 protons and 6 neutrons has a natural relative abundance of 98.9%, 13 C with 7 neutrons has a natural relative abundance of 1.1%, and 14 C with 8 neutrons exists in trace amounts ( 14 C is the radioactive isotope used in carbon dating). 13 C is useful in MFA, because it can be distinguished from 12 C through techniques such as NMR and mass spectrometry. 5
The metabolites of interest are typically molecules based on a carbon chain with 1 to 6 atoms of length. The word isotopomer describes a particular isotope isomer, or a carbon based molecule with a specific labeling pattern. For an example let s look at a metabolite with 3 carbons: Unlabeled 1 13 C atom 2 13 C atoms Fully Labeled Figure 3: Light circles represent C 12 atoms. Dark circles represent C 13 atoms. There are a total of 8 isotopomers (2 3 ) with distinct labeling patterns in a molecule with a 3-carbon chain (assuming the molecule is not symmetric). Isotopomer measurements combined with knowledge of the carbon transitions in a metabolic network can help us gain information about the flux values. To this extent, a system of differential equations is formed to model the network. The problem is that these equations are non-linear and cannot be solved analytically. Nevertheless, they are helpful. Wiechert analyzed the balance equations and found that isotopomer systems are stable. After a period of time, the isotopomer fractions reach steady values. This allows us to think of isotopomers in terms of balance equations. Wiechert did more work to show that the isotopomer balance equations could be transformed into a compartmental linear system. To understand this, one must first grasp Wiechert s conceptual invention of cumomers. Cumomers The word cumomer describes a cumulative isotopomer. It is not a distinct physical entity, like isotopomers, but rather a collection of isotopomers. Cumomers are marked just as isotopomers are labeled. The difference is that with cumomers specific carbons are marked if they have a potential to be labeled in the 13 C sense. For example, look at the hypothetical 5-carbon molecule in figure 4. 6
Figure 4: Dark carbons are marked or labeled. The example cumomer in figure 4 is marked in the 2 and 4 positions. This cumomer represents the collection of the 4 isotopomers on the right. A completely unmarked cumomer represents the unlabeled isotopomer. A completely marked cumomer represents all possible isotopomers. Thinking of the labeled system in terms of cumomers reduces the complexity of the mathematics at very little cost. Values for isotopomer fractions and cumomer fractions are interchangeable through a simple linear transformation. This allows for the analytical solution of all intracellular cumomer fractions in terms of input cumomer fractions and flux values. This is possible because the cumomer balance equations are a compartmental linear system. Central to all of this is the notion of cumomer weight, which is assigned based on the number of marked carbons. For example, the cumomer in figure 4 has a weight of 2, because it has 2 marked carbons. Cumomer balance systems are compartmentalized in terms of weight. For each weight there exists a square linear system. Say we have a system with maximum weight of 3. The linear systems could be written as follows: F 1 y 1 = r 1 F 2 y 2 = r 2 (y 1 ) F 3 y 3 = r 3 (y 2,y 1 ) where the subscripts denote weight. In this notation y i symbolizes the fraction vector for cumomers of weight i. F i symbolizes the flux matrix for weight i. It is called a flux matrix because its elements are expressions of only flux values. r i symbolizes the right hand side vector. An important property of the r i vector is that its elements are expressions dependent on flux values and lower weight cumomer fractions. A cumomer of a given weight will never depend on a cumomer of higher weight. In the cumomer balance equations above this is shown by r i s dependence on lower weight cumomers. Solving the systems from low to high weight leads to the complete solution for the cumomer fractions: 7
y 1 = F "1 1 r 1 y 2 = F "1 2 r 2 (y 1 ). y 3 = F "1 3 r 3 (y 2,y 1 ) [6] The cumomer balance equations can also be written in another form: Cv. [7] This is closer to the stoichiometric balance equations. C is called a cumomer matrix, because its elements are expressions involving cumomer fractions. v is the vector of fluxes. Each row of C corresponds to the balance equation for a single isotopomer. This form is useful in determining if labeling experiments will help to fully uncover the intracellular fluxes. Information Analysis This section will describe how to use the stoichiometric and cumomer balance systems to determine whether labeling experiments will completely uncover the intracellular fluxes. This process is fairly straightforward. First, we must obtain the flux information by substituting the constrained flux equation [eq. 5] into the cumomer balance [eq. 7]: Cv C(Nx + v 0 ) CNx = "Cv 0 Mx = "m Here, M is called the flux information matrix and x is the vector of free fluxes gained from stoichiometry. M " R m#n, where m (number of matrix rows) is the number of intracellular cumomers and n (number of matrix columns) is the number of free fluxes. For realistic networks, m > n so the flux information system is over determined. Next, we must remove redundancies from the elements of M. This is done by replacing all of the cumomer fraction values in M with their free representation [eq. 6] obtained by solving the linear compartmental system. Finally, Gaussian elimination is used to analyze the rank of M. If the rank of M equals the dimension of x, the system is completely solvable. Even if M is rank deficient (less than the dimension of x) it is able to further constrain the flux values. The metabolic engineer should use this information to decide whether pursuit of carbon labeling data is worthwhile. 8
Matlab Tool A set of Matlab tools has been developed to carry out the mathematics described above. Documentation and examples can be found online (Code is available on Dr. Steve Cox s Metabolic Engineering website). The user must specify a metabolic network with carbon transitions and feed it into the code. The tool creates all of the necessary data structures and carries out symbolic full information analysis. At the end, it notifies the user of the rank of the information matrix in comparison to the dimension of the free flux vector. Conclusions This project has completed two goals: to absorb the information spread throughout Wiechert s papers and implement the ideas in Matlab. This paper has covered the basics of mathematics behind metabolic flux analysis using carbon labeling experiments. Future work should be done to make the code handle measurement and statistical concerns. References 1. Nicole Isermann and Wolfgang Wiechert. Metabolic isotopomer labeling systems. Part II: structural flux identifiability analysis. Mathematical Biosciences, Volume 183, Issue 2, June 2003, Pages 175-214. (http://www.sciencedirect.com/science/article/b6vhx-47yydk2-1/2/777de4a375569b9ebbbb6f383e9a8875) 2. Wolfgang Wiechert and Michael Wurzel. Metabolic isotopomer labeling systems: Part I: global dynamic behavior. Mathematical Biosciences, Volume 169, Issue 2, February 2001, Pages 173-205. (http://www.sciencedirect.com/science/article/b6vhx-428fk7c- 4/2/9d6dbbd409d4e2ebf821021e3e4dbb48) 3. Wolfgang Wiechert et al. Bidirectional Reaction Steps in Metabolic Networks: III. Explicit Solution and Analysis of Isotopomer Labeling Systems. Biotechnology and Bioengineering, Volume 66, Number 2, July 1999, Pages 69-85. 4. Gregory N. Stephanopoulos, Aristos A. Aristidou, and Jens Nielsen. Metabolic Engineering: Principles and Methodologies. San Diego: Academic Press, 1998. 5. Thomas Szyperski. 13 C-NMR, MS and metabolic flux balancing in biotechnology research. Quarterly Reviews of Biophysics, 31, 1 (1998), Pages 41-106. 9