State-Feedback Control of Partially-Observed Boolean Dynamical Systems Using RNA-Seq Time Series Data Mahdi Imani and Ulisses Braga-Neto Department of Electrical and Computer Engineering Texas A&M University College Station TX USA Email: m.imani88@tamu.edu ulisses@ece.tamu.edu Abstract External control of a genetic regulatory network is used for the purpose of avoiding undesirable states such as those associated with disease. This paper proposes a strategy for state-feedback infinite-horizon control of Partially-Observed Boolean Dynamical Systems (POBDS) using a single time series of Next-Generation Sequencing (NGS) RNA-seq data. A separation principle is assumed whereby first the optimal stationary policy is obtained offline by solving Bellman s equation and then an optimal MMSE observer the Boolean Kalman Filter is employed for online implementation of the policy using the RNAseq observations of the evolving system. Performance is investigated using a Boolean network model of the mutated mammalian cell cycle and simulated RNA-seq observations. State-Feedback Control Boolean Dynamical Systems Boolean Kalman Filter Next-Generation Sequencing. I. INTRODUCTION A fundamental problem in genomic signal processing is to design intervention strategies for gene regulatory networks to beneficially alter network dynamics. This task is usually done to reduce the steady-state mass of undesirable states such as cell proliferation states which may be associated with cancer [1]. Boolean networks [2] have emerged as an effective model of the dynamical behavior of gene regulatory networks consisting of genes in activated/ inactivated states the relationship among which is governed by logical relationships updated at discrete time intervals [3] [4]. To date different modeling approaches such as Probabilistic Boolean Networks (PBNs) [3] S-systems [5] and Bayesian networks [6] have been proposed in literature to mathematically capture the behavior of genetic regulatory networks. In addition various intervention approaches [1] [7] [9] have been developed. These all assume that the system states are directly measurable and are mostly in the framework of Probabilistic Boolean Networks (PBNs). In this paper we consider a signal model with distinct state and observation processes; thus the transcriptional state of the genes is not assumed to be observable directly but only though indirect RNA-seq measurements [4] [10] [12]. Initially we apply the theory of infinitehorizon optimal stochastic control to find the stationary policy that optimally controls the evolution of a Boolean dynamical system is applied. In the next step we apply the optimal MMSE observer for the proposed signal model known as the Boolean Kalman Filter [4] to estimate the state of the system to which is applied the previously obtained control policy. Performance is investigated using a Boolean network model of the mutated mammalian cell cycle and simulated RNA-seq observations. II. PARTIALLY-OBSERVED BOOLEAN DYNAMICAL SYSTEMS Deterministic Boolean network models are unable to cope with (1) uncertainty in state transition due to system noise and the effect of unmodeled variables; the fact that (2) the Boolean states of a system are never observed directly this calls for a stochastic approach. We describe below the stochastic signal model for partially-observed Boolean dynamical systems first proposed in [4] which will be employed here. A. State Model We assume that the system is described by a state process {X k ; k = 0 1...} where X k {0 1} d is a Boolean vector of size d in the case of a gene regulatory network the components of X k represent the activation/inactivation state of the genes at time k. The state is affected by a sequence of control inputs
{u k ; k = 0 1...} where u k U represents a purposeful intervention into the system state in the biological example this might model drug applications. The state evolution is thus specified by the following discrete-time nonlinear signal model: X k = f (X k 1 u k 1 ) n k (1) for k = 1 2... where f {0 1} d U {0 1} d is a network function {n k ; k = 1 2...} is a white state noise process with n k {0 1} d and indicates component-wise modulo-2 addition. The noise is white in the sense that it is uncorrelated; i.e. n k and n l are uncorrelated for k l. In addition the noise process is assumed to be uncorrelated from the state process and control input. B. Observation Model In most real-world applications the system state is only partially observable and distortion is introduced in the observations by sensor noise. Let {Y k ; k = 1 2...} be the observation process which is the actual data obtained from the experiment. The observation Y k is formed from the state X k through the equation: Y k = h (X k v k ) (2) for k = 1 2... where {v k ; k = 1 2...} is a white observation noise process which is uncorrelated from the state process. We will consider here the specific case of RNA-seq transcriptomic data in which case Y k = (Y k1... Y kd ) contains the RNA-seq data at time k; for a single-lane NGS platform Y kj is the read count corresponding to transcript j in the single lane for j = 1... d. In this paper we choose to use a Negative Binomial model for the number of reads for each transcript: P (Y kj = m X kj ) = m Γ(m + φ j ) m! Γ(φ j ) ( λ kj φ j ) ( ) λ kj + φ j λ kj + φ j φ j (3) for m = 0 1...; where Γ denotes the Gamma function and φ j λ kj > 0 are the real-valued inverse dispersion parameter and mean read count of transcript j at time k respectively for j = 1... d. The inverse dispersion parameter models observation noise; the smaller it is the more variable the measurements are. Now recall that according to the Boolean state model there are two possible states for the abundance of transcript j: high if X kj = 1 and low if X kj = 0. Accordingly we model the parameter λ kj in log space as: log λ kj = log s + µ + δ j X kj (4) where the parameter s is the sequencing depth (which is instrument-dependent) µ > 0 is the baseline level of expression in the inactivated transcriptional state and δ j > 0 expresses the effect on the observed RNA-seq read count as gene j goes from the inactivated to the activated state for j = 1... d. Typical values for all these parameters are given in section VI. III. BOOLEAN KALMAN FILTER The optimal filtering problem consists of given the history of observations Y 1... Y k up to the present time k finding an estimator ˆX k (Y 1... Y k ) of the state X k that optimizes a given performance criterion. The criterion considered here is the (conditional) mean square error: MSE(Y 1... Y k ) = E [ ˆX k X k 2 Y k... Y 1 ] (5) The minimum mean-square error (MMSE) state estimator for the model described in the previous sections is the Boolean Kalman Filter (BKF). A recursive algorithm for the exact computation of the BKF for a general signal model was given in [4]. Briefly let (x 1... x 2d ) be an arbitrary enumeration of the possible state vectors. For each time k = 1 2... define the posterior distribution vectors (PDV) Π k k and Π k k 1 of length by (Π k k ) i = P (X k = x i Y k... Y 1 ) (6) (Π k k 1 ) i = P (X k = x i Y k 1... Y 1 ) (7) for i = 1.... Let the prediction matrix M k of size be the transition matrix of the controlled Markov chain corresponding to the state process: (M k ) ij = P (X k = x i X k 1 = x j u k 1 = u) = P (n k = x i f(x j u)) (8) for i j = 1.... On the other hand assuming conditional independence among the measurements given the state the update matrix T k also of size is a diagonal matrix defined by: (T k (Y k )) ii = P (Y k X k = x i ) d Γ(Y kj + φ j ) s exp(µ + δ j (x i Y kj ) j ) = j=1 Y kj! Γ(φ j ) s exp(µ + δ j (x i ) j ) + φ j φ φ j j ( ) s exp(µ + δ j (x i ) j ) + φ j (9) 2
for i = 1.... For a Boolean vector v {0 1} d define the binarized vector v(i) = I v(i)>1/2 and the complement vector v c (i) = 1 v(i) for i = 1... d and the L 1 -norm v 1 = d v(i). Finally define the matrix A of size d via A = [x 1 x 2d ]. Then it can be shown that the optimal MMSE estimator ˆX k can be computed by Algorithm 1. Algorithm 1 Boolean Kalman Filter 1: Initialization Step: The initial PDV is given by (Π 0 0 ) i = P (X 0 = x i ) for i = 1... For k = 1 2... do: 2: Prediction Step: Given the previous PDV Π k 1 k 1 the predicted PDV Π k k 1 is given by Π k k 1 = M k Π k 1 k 1 3: Update Step: Given the current observation Y k =y k let β k = T k (y k ) Π k k 1 The updated PDV Π k k is obtained by normalizing β k to obtain a probability measure: Π k k = β k / β k 1 4: MMSE Estimator Computation Step: The MMSE estimator is given by ˆX k = AΠ k k with optimal conditional MSE MSE(Y 1... Y k ) = min{aπ k k (AΠ k k ) c } 1. IV. DISCOUNTED COST INFINITE-HORIZON CONTROL In this section the infinite-horizon control problem when the states of the system are directly observed is formulated. We will state the problem and give its solution without proof. For more details the reader is referred to [7] [13]. The goal of control is to select the appropriate external input u ext k U ext at each time k to make the network spend the least amount of time on average in undesirable states; e.g. states associated with cell proliferation which may be associated with cancer [1]. In formal terms assuming a bounded cost of control g(x k u ext k ) that is a function of the state and the intervention at time k our goal is to find a stationary control policy µ {0 1} d U ext which minimizes the infinite-horizon cost (for a given initial state X 0 = x j 0 ): m J µ (j) = lim E γ m k g (X k µ(x k )) (10) k=0 for j = 1... where 0 < γ < 1 is a discounting factor that ensures that the limit of the finite sums converges as the horizon length m goes to infinity. The discount factor places a premium on minimizing the costs of early interventions as opposed to later ones which is sensible from a medical perspective [7]. Let Π be the set of all admissible control policies. We assume in this paper that the set of external inputs U ext is finite so that Π is also finite. The optimal infinitehorizon cost is defined by J (j) = min µ Π J µ(j) (11) for j = 1... where J µ (j) is given by (10). The following theorem which follows from classical results proved in [13] provides necessary and sufficient conditions for optimality of a stationary control policy. Here we assume that the transition matrix M k of the controlled Markov chain defined in (8) may depend on k only through the input; the dependency on the input is made explicit by writing M(u) while dropping the index k. Theorem 1. (Bellman s Equation) The optimal infinitehorizon cost J satisfies: J (j) = min g(x u U j u) + γ (M(u)) ij J (i) ext (12) for j = 1.... Equivalently J is the unique fixed point of the functional T [J] defined by T [J](j) = min g(x u U j u) + γ (M(u)) ij J(i) ext (13) for j = 1... where J {0 1} d R is an arbitrary cost function. Furthermore a stationary policy µ is optimal if and only if it attains the minimum in Bellman s equation (12) for each initial state x j that is T [J ](j) = min g(x u U j u) + γ ext = g(x j µ (x j )) + γ (M(u)) ij J (i) (M(µ (x j ))) ij J (i) (14) 3
for j = 1.... The previous theorem suggests a procedure to determine the optimal stationary control policy. Starting with an arbitrary initial cost function J 0 {0 1} d R run the iteration J t = T [J t 1 ] (15) until a fixed point is obtained it can be shown that the iteration will indeed converge to a fixed point [13]. This fixed point is the optimal cost J and the corresponding policy defined by (14) is the optimal stationary control policy. In practice a stopping criterion has to be defined to stop the iteration when the change between consecutive J t 1 and J t is small enough. The procedure is summarized in Algorithm 2. Algorithm 2 Optimal Infinite-Horizon Control 1: Initialization Step: Initialize J 0 arbitrarily e.g. J 0(j) = max... g(x i u) for j = 1.... Choose a small number β and discounting factor γ. 2: Let J(j) = J(j) for j = 1.... 3: Let J(j) = min u U ext g(x j u) + γ (M(u)) ij J(i) for j = 1.... 4: Compute = max j=1...( J(j) J(j) ). 5: If > β go to step 2. 6: The optimal stationary control policy is: µ (x j ) = arg min for j = 1.... u U ext g(x j u) + γ (M(u)) ij J(i) V. OPTIMAL INTERVENTION STRATEGY To be able to apply the optimal stationary policy the states of the system must be known at each time step. However in most real applications the states of the systems are not directly observable. Therefore we propose to use the Boolean Kalman Filter to estimate the states for application of the policy. While the optimal policy is determined offline the BKF is applied online and the control input dictated by the optimal control policy is applied to the state estimate at each time k (the control policy is suboptimal on the space of observed data). The proposed approach is illustrated in Figure 1. BKF Optimal Policy System Fig. 1: The proposed scheme for intervention to data observed through noise. VI. NUMERICAL EXPERIMENT In this section we present results of a numerical experiment using a Boolean network based on the mutated mammalian cell cycle network. Mammalian cell division is coordinated with the overall growth of the organism through extracellular signals that control the activation of CycD in the cell. This mutation introduces a situation where both CycD and Rb might be inactive in which case the cell cycles (i.e. divides) in the absence of any growth factor [14]. This suggests considering the logical states in which both Rb and CycD are downregulated as undesirable proliferatory states. Table I presents the Boolean functions of the mutated cell-cycle network. TABLE I: Mutated Boolean functions of mammalian cell cycle. Genes CycD Rb E2F CycE CycA Cdc20 Cdh1 UbcH10 CycB Predictors Input (CycD CycE CycA CycB) (Rb CycA CycB) (E2F Rb) (E2F Rb Cdc20 (Cdh1 UbcH10)) (CycA Rb Cdc20 (Cdh1 UbcH10)) CycB (CycA CycB) Cdc20 Cdh1 (Cdh1 UbcH10 (Cdc20 CycA CycB) (Cdc20 Cdh1) The control input consists of flipping or not flipping the state of a control gene. The cost function is defined as follows: g(x j 5 + O(u) if (CycDRb) = (00) u) = { O(u) if (CycDRb) (00) where O(u) is 0 if state of the control gene is not flipped or 1 if it is flipped. The process noise is assumed to have independent components distributed as Bernoulli(p) with intensity p = 0.01 so that all genes are perturbed with a small 4
Fraction of observed desirable states in long run Sum of state values of CycD and Rb genes probability except the transducer gene CycD which is perturbed by noise with intensity p = 0.5 to simulate the presence or absence of extracellular signals sensed by this gene. The parameters of the RNA-seq model are assumed as follows: the sequencing depth s is equal to 2.875 which corresponds to 50K 100K reads; the baseline expression µ is set to be 0.01; the parameter δ i is assumed to be 2 for all transcripts (i = 1... 8) and finally two values of the inverse dispersion parameters φ i = 0.5 and φ = 5 are used to model the high and low dispersed measurements respectively. The simulation proceeds as follows: first the optimal stationary policy is found based on Algorithm 2 with the threshold parameter β = 10 6 and γ = 0.95. Then an initial state and corresponding observed RNA-seq measurement are generated. Based on the estimated state by the BKF and the optimal stationary policy obtained initially the appropriate control function is applied to the system. Based on the current intervention the next data point is generated. The process continues in an online closed loop model. The sum of the state values of CycD and Rb genes for the system under control of Rb gene and without control are presented in Figure 2 over 80 time steps. It can be seen that the simultaneous inactivation of both CycD and Rb genes occurred several times for the uncontrolled system; however this undesirable condition (simultaneous inactivation of both Rb and CycD genes) is visited less often in the system under control of the Rb gene. Furthermore better control can be seen in the presence of low dispersed data. The reason is that the performance of state estimation is worse for high dispersed data leading to lower control performance as well. Next the results of the proposed method under control of the Rb gene observed through simulated RNA-seq data are compared with the results of the value iteration method with directly observed states. The fractions of observed desirable states over 5000 time steps for data with high and low dispersion and different process noise are shown in Figure 3. It should be noted that since the stationary policy is dependent on the process noise different policies are obtained for each value of the process noise. As we expected the value iteration method with directly observed states has higher performance than the system with partially-observed states. In addition the fractions of observed desirable states for systems observed through low dispersion data are very close to the value iteration method; however in data with high dispersion the accuracy of state estimation by BKF is Under Control of Rb Time a. Low dispersed data ( ) Without Control Time b. High dispersed data ( ) Fig. 2: Sum of state values of CycD and Rb genes over 80 time steps for system under control of the Rb gene and without control. smaller which degrades control performance. Fig. 3: Fraction of observed desirable states for the system with directly observed states (VI) the proposed method with observations through RNA-seq data (VI- BKF) and system without control (Low Dispersion: φ = 5 High Dispersion φ = 0.5). The fraction of observed desirable states using different genes as control input over 5000 time steps is shown in Figure 4. It is clear that the Rb gene is the best control gene for this system. 5
Fraction of observed desirable states in long run CycD Rb E2F CycE CycA Cdc20 Cdh1 UbcH10 CycB No control Fig. 4: Fraction of desirable states using different control genes for process noise p = 0.01 and φ j = 5 for j = 1.... VII. CONCLUSION In this paper an intervention strategy for partiallyobserved Boolean dynamical systems is discussed. The method contains two main steps: in the first step the optimal control policy for the system with directly observed states is obtained and in the second step the obtained control policy is employed to the system based on the estimated states by the Boolean Kalman Filter. The performance of the method is evaluated using a Boolean network model of the mutated mammalian cell cycle and simulated RNA-seq observations. ACKNOWLEDGMENT The authors acknowledge the support of the National Science Foundation through NSF award CCF-1320884. REFERENCES [1] A. Datta A. Choudhary M. L. Bittner and E. R. Dougherty External control in markovian genetic regulatory networks Machine learning vol. 52 no. 1-2 pp. 169 191 2003. [2] S. A. Kauffman Metabolic stability and epigenesis in randomly constructed genetic nets Journal of theoretical biology vol. 22 no. 3 pp. 437 467 1969. [3] I. Shmulevich E. R. Dougherty and W. Zhang From boolean to probabilistic boolean networks as models of genetic regulatory networks Proceedings of the IEEE vol. 90 no. 11 pp. 1778 1792 2002. [4] U. Braga-Neto Optimal state estimation for boolean dynamical systems in Signals Systems and Computers (ASILOMAR) 2011 Conference Record of the Forty Fifth Asilomar Conference on pp. 1050 1054 IEEE 2011. [5] E. O. Voit and J. Almeida Decoupling dynamical systems for pathway identification from metabolic profiles Bioinformatics vol. 20 no. 11 pp. 1670 1681 2004. [6] N. Friedman M. Linial I. Nachman and D. Pe er Using bayesian networks to analyze expression data Journal of computational biology vol. 7 no. 3-4 pp. 601 620 2000. [7] R. Pal A. Datta and E. R. Dougherty Optimal infinite-horizon control for probabilistic boolean networks Signal Processing IEEE Transactions on vol. 54 no. 6 pp. 2375 2387 2006. [8] Á. Halász M. S. Sakar H. Rubin V. Kumar G. J. Pappas et al. Stochastic modeling and control of biological systems: the lactose regulation system of escherichia coli Automatic Control IEEE Transactions on vol. 53 no. Special Issue pp. 51 65 2008. [9] G. Vahedi B. Faryabi J.-F. Chamberland A. Datta and E. R. Dougherty Optimal intervention strategies for cyclic therapeutic methods Biomedical Engineering IEEE Transactions on vol. 56 no. 2 pp. 281 291 2009. [10] M. Imani and U. Braga-Neto Optimal state estimation for boolean dynamical systems using a boolean kalman smoother in 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP) pp. 972 976 IEEE 2015. [11] M. Imani and U. Braga-Neto Optimal gene regulatory network inference using the boolean kalman filter and multiple model adaptive estimation in 2015 49th Asilomar Conference on Signals Systems and Computers pp. 423 427 IEEE 2015. [12] A. Bahadorinejad and U. Braga-Neto Optimal fault detection and diagnosis in transcriptional circuits using next-generation sequencing IEEE/ACM Transactions on Computational Biology and Bioinformatics 2015. [13] D. P. Bertsekas D. P. Bertsekas D. P. Bertsekas and D. P. Bertsekas Dynamic programming and optimal control vol. 1. Athena Scientific Belmont MA 1995. [14] M. R. Yousefi A. Datta and E. R. Dougherty Optimal intervention strategies for therapeutic methods with fixed-length duration of drug effectiveness Signal Processing IEEE Transactions on vol. 60 no. 9 pp. 4930 4944 2012. 6