Sparse, stable gene regulatory network recovery via convex optimization Arwen Meister June, 11 Gene regulatory networks Gene expression regulation allows cells to control protein levels in order to live and grow. A major mode of gene regulation is the binding of proteins called transcription factors, encoded by particular genes, to specific DNA promoter sites [Bintu 5]. Genes regulated in this way may in turn encode other regulatory proteins. We can model these interactions as a network where regulatory genes are nodes and edges represent regulatory relationships [Zhou 7]. We wish to infer gene regulatory networks based on experimental data. Dynamical systems model We model the cell state as a time-varying vector x(t) R n of gene expression levels that evolves according to dx/dt = A(x(t)), where A : R n R n is a smooth nonlinear function. Equilibrium points µ such that A(µ) = correspond to basic cell types like embryonic stem cell or liver cell. (During its lifecycle a cell may move through many equilibria.) Taylor expanding A about an equilibrium µ yields: dx dt = A(x) T(x µ) = x(t) µ ett (x µ) (1) where T is the n n Jacobian matrix of A at µ and x is close to µ. The matrix T models the regulatory network at equilibrium: T i,j > if gene j up-regulates gene i; T i,j < corresponds to down-regulation. The diagonals of T reflect not only self-regulation, but also degradation of gene products. (We assume that gene degradation occurs at a known fixed rate γ.) Our goal will be to infer T at particular equilibrium µ using perturbation data and structural knowledge. We know the regulatory network is sparse, since each regulator has only a few targets. That is, T +γi has small cardinality (taking the degradation rate γ into account on the diagonal). Furthermore, we know the equilibrium is stable, since the cell recovers from small perturbations [Lacy 3]. Mathematically, µ is stable if there exists a Lyapunov matrix P such that PT +T T P [Walker 1939], or equivalently, if the eigenvalues of T all have non-positive real parts. 1
Convex modeling To recover the network matrix T from noisy measurements x(t), we must solve minimize x(t) µ e tt (x µ) subject to card(t +γi) k PT +T T P with variables T R n n,p R n n, and data t,γ R,µ,x,x(t) R n. The objective ensures that T is consistent with the data x(t). The first constraint enforces network sparsity, and the second enforces stability of the equilibrium µ. To formulate a convex problem, we replace the exponential in the objective with the linearization e tt I +tt, and the cardinality constraint with an l 1 term [Tibshirani 1996]. minimize (x(t) x ) tt(x µ) +λ i,j (T +γi) i,j subject to PT +T T P with variables T,P. The problem is not jointly convex in T and P, so we will use an iterative heuristic to solve it approximately [Zavlanos 1]. We will alternately fix one variable and solve in the other, starting with P = I fixed. For simplicity, we gave the formulation for one perturbation x and one measurement, while we actually need at least n perturbations to recover T R n n, and might have several measurements. Assuming N perturbations x (j) leading to trajectories x (j) (t), j = 1,...,N and m measurements per trajectory, the problem data are µ R n,γ R,t i R,x (j),x (j) (t i ), j = 1,...,N,i = 1,...,m, and the complete problem is: minimize N m j=1 i=1 (x(j) (t i ) x (j) ) t i T(x (j) µ) +λ( i j (T +γi) ij subject to PT +T T P with variables T,P. Problem data The problem data comes from noisy genome-wide expression measurements shortly after a gene knockdown, in which the expression level of one gene is reduced to a fixed level. Modeling a knockdown as a small perturbation and the subsequent evolution as an exponential trajectory is a very poor approximation, but data fitting combined with regularization may allow approximate network recovery. Recovering the diagonals of T is particularly challenging, since the knockdowns fix gene expression at a reduced level, thereby preventing direct detection of self-regulation. Multiple time points can yield indirect information, since perturbing a regulator at time t leads to perturbed targets at time t 1, and we can observe the effects of target self-regulation at time t. However, the signal is very weak compared to the direct signal, so regularization is especially important for the diagonals.
Figure 1: Network structure; T true ; basic clean recovery; basic noisy recovery. We test our approach on a simulation of a six-gene subnetwork in embryonic stem cell, where the network matrix T true is known [Chickarmane 8]. The network and T true are shown in Figure 1. To generate data, we fix each gene in turn at 5% of equilibrium level and let the others evolve. We sample x(t 1 ),x(t ) for small t 1,t. To generate noisy versions of the data, we add 1% Gaussian noise to the signal. Basic Recovery We first try basic recovery, minimizing (x(t) x ) tt(x µ) without enforcing sparsity or stability (Figure 1). In the noiseless case, the recovery works well. The matrix is not quite sparse or stable, but it has many nearly-zero entries and only one small positive eigenvalue. With noisy data, T recovered is still nearly sparse, but the diagonals are not recovered and the matrix has large positive eigenvalues, violating the stability constraint. Enforcing sparsity For noisy data, l 1 regularization may improve both sparsity and diagonal recovery. We tune the sparsity parameter λ with leave-one-out cross validation, omitting each knockdown in turn, fitting on the other five, and testing on the omitted data. We then average the prediction error over all the test sets. Figure (left) shows the results for several noisy data instances. The error drops sharply at around λ =.9; further increasing λ does not significantly change the error, but choosing λ too large makes the recovery too sparse. λ =.1 seems a reasonable choice. The plots of the absolute error and sparsity of T versus λ in Figure indicate that λ =.1 provides a good tradeoff between accuracy and sparsity. Enforcing stability We enforce the stability constraint using an iterative heuristic in which we solve alternately in T and P, starting with P = I. The iterates are always feasible (P k T k + T T k P k k), but the iteration is not guaranteed to converge to the solution, nor are there non-heuristic 3
avg prediction error 3 1 Cross-validation.1.15. λ T Ttrue 1.75.5.5 Error in T vs. λ.5.1.15 λ number of zeros 15 1 5 Sparsity of T vs. λ λ Ttrue basic recovery with sparsity.5.1.15 Figure : Sparsity parameter selection. Cross-validation for several noisy instances (left); absolute error in T recovered compared to T true versus λ (center); sparsity of T recovered versus λ (right). stopping criteria. We terminate when T k T k 1 ǫ for some tolerance ǫ. P = I; k = 1; while T k T k 1 tol T k = argmin P k T+T T P k (x(t) x ) tt(x µ) +λ i,j P k = argmin PT k +T T k P () k = k +1; (T +γi) i,j We test on noisy data with λ =.1. Plots of the objective value and maximum eigenvalue of the iterates T k are shown in Figure 3. The optimum objective value is unknown. The objective values of the iterates do appear to converge to the objective value of the matrix recovered from the same noisy data instance without enforcing stability. T true has a higher objective value (since our model is only approximate, the recovered matrices fit it better than T true does). Since stability is equivalent to Re(λ i (T)) i, the maximum eigenvalue of the T k provides a measure of stability. The maximum eigenvalues of the iterates increase quickly to just below zero, so the stability condition is not unnecessarily strict in the end. Nonlinear approximations of the exponential Initially, we used the linearization e tt I + tt to form a convex objective. The Cayley transform C T = (I + 1 T) 1 (I 1 T) is an attractive alternative, since it is a quadratically accurate model of e T and inserted into our problem yields the convex objective: (I tt/)x (I +tt/)x(t). The Cayley transform is a generalization of the bilinear function 1+1 z 1 1 z, which is a quadratically close to e z for small z (Figure 4 top left). To see the basic idea, we assume that T is 4
Objective values of iterates T k.5 Maximum eigenvalue of iterates T k T k objective value.14.1 T unstable T true λmax T k T unstable T true.6 1 3 k -.5 1 3 Figure 3: Stability iteration. Objective value of iterates T k (left); maximum eigenvalue (real-part) of iterates T k (right). k diagonalizable and take the eigenvalue decomposition T = VΛV 1 : C T = (I + 1 T) 1 (I 1 T) = V(I + 1 Λ) 1 (I 1 Λ)V 1 = V diag ( 1 1λ i 1+ 1λ )V 1 i = e T C T V max e λ i 1 1λ i i 1+ 1λ V 1 = O(λ max (T) κ(v)) i Hence the Cayley transform is quadratically close to the matrix exponential. Figure 4 (top left) shows that the bilinear function models e z very well for for z < 1, so we expect the Cayley transform to work well for e tt if λ max (T) < 1/t. Figure 4 (top right) confirms this. Tofurtherimprovethemodel,wecanrefineanyestimate ˆT byminimizing x(t) e t(ˆt+δ )x() over δ R n n and setting T = ˆT + δ. (We can replace e tˆt+δ with the linearization e tˆt +δ+ 1 tδˆt + 1 tˆtδ to get a convex problem.) When we tested these methods on samples from an exponential trajectory, we found that the Cayley model was quadratically accurate, and the δ-refinement added two digits of accuracy to either estimate. Figure 4 (bottom row) shows a trajectory generated by T recovered from each model. The δ-refined Cayley model dramatically outperforms the linear model. Unfortunately, we saw no improvement at all when we applied these methods to the real problem data. We later found that the knockdown data fits the linear model better than the exponential one. This is reasonable, since knockdowns are not only dramatic perturbations, but also change the structure of the network by fixing one variable. Since the data does not follow a true exponential trajectory, we may as well use a linear model. 5
Bilinear approximation of exponential Recovery of z from samples of e tz e z (1+ 1z)/(1 1z) 1+z 4 true linear Cayley recovered z - - -4-4 - z Predicted trajectory -5 5 exponent z Close-up true linear Cayley linear +δ Cayley +δ x1(t) x1(t) time t time t Figure 4: Nonlinear approximations of exponential. Approximation of scalar exponential e z by bilinear vs. linear function (top left); recovery of a scalar exponent z from samples of e tz (with t =.1,.) using linear vs. Cayley model (top right); predicted trajectory of single variable using T recovered using linear, Cayley, and δ-refined models (bottom left); close-up of trajectory (bottom right) Figure 5: Successfulrecovery. T true (left)(16zerosandre(λ max ) =.3); T recovered from noisy data with sparsity and stability (right) (17 zeros and Re(λ max ) =.6). 6
Conclusion We are can recover T from noisy data quite successfully using the linear model with l 1 - regularization and iterative enforcement of the stability constraint. Figure 5 shows a matrix recovered from noisy data using this method. It has the right sparsity level, corresponds to a stable equilibrium at µ, and captures the off-diagonals of T true very well and the diagonals reasonably well. Because the knockdowns do not truly follow an exponential model, we gain nothing by using sophisticated approximations of the exponential. However, the sparsity and stability constraints are very helpful, both imparting desired properties to the network, and also regularizing the solution, greatly improving the network recovery from noisy data. References 1. Bintu L., Buchler N.E., Garcia H.G., Gerland U., Hwa T., et al. (5) Transcriptional regulation by the numbers: models. Curr. Opin. Genet. Dev. 15:11614.. Boyd, S. & Vandenberg, L. (4) Convex Optimization. Cambridge University Press, Cambridge. 3. Chickarmane, V. & Peterson, C. (8) A Computational Model for Understanding Stem Cell, Trophectoderm and Endoderm Lineage Determination. PLoS ONE 3(1):e3478. 4. Lacy, S.L., Bernstein, D.S. (3) Subspace Identification With Guaranteed Stability Using Constrained Optimization. IEEE Trans. Automat. Control. 48(7):159-163. 5. Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B. 58(1):67-88. 6. Walker J.A. (1939) Dynamical systems and evolution equations. Plenum Press, New York. 7. Zhou, Q., Chipperfield, H., Melton, D.A. and Wong, W.H. (7) A gene regulatory network in mouse embryonic stem cells. Proc. Natl. Acad. Sci. USA, 14:16438-16443. 8. Zavlanos, M., Julius, A., Boyd, S.P., Pappas, G.J. (1) Inferring Stable Genetic Networks from Steady-State Data. Preprint submitted to Automatica. 7