LINEAR COMPARTMENTAL MODELS: INPUT-OUTPUT EQUATIONS AND OPERATIONS THAT PRESERVE IDENTIFIABILITY. 1. Introduction

LINEAR COMPARTMENTAL MODELS: INPUT-OUTPUT EQUATIONS AND OPERATIONS THAT PRESERVE IDENTIFIABILITY ELIZABETH GROSS, HEATHER HARRINGTON, NICOLETTE MESHKAT, AND ANNE SHIU Abstract. This work focuses on the question of how identifiability of a mathematical model, that is, whether parameters can be recovered from data, is related to identifiability of its submodels. We look specifically at lear compartmental models and vestigate when identifiability is preserved after addg or removg model components. In particular, we exame whether identifiability is preserved when an put, output, edge, or leak is added or deleted. Our approach, via differential algebra, is to analyze specific put-output equations of a model and the Jacobian of the associated coefficient map. We clarify a prior determantal formula for these equations, and then use it to prove that, under some hypotheses, a model s put-output equations can be understood terms of certa submodels we call output-reachable. Our proofs use algebraic and combatorial techniques. Keywords: identifiability, lear compartmental model, put-output equation, matrix-tree theorem 1. Introduction Identifiability refers to the problem of whether parameters of a mathematical model can be recovered from data. We focus on structural identifiability, that is, whether the model equations allow estimation of a fite number of parameters from noise-free data, henceforth referred to as identifiability. Here we vestigate the identifiability problem for a class of models used extensively biological applications, namely, lear compartmental models [5, 8]. Our terest is the followg question: When is identifiability preserved after components of the model such as puts, outputs, leaks, or edges are added or removed? For example, a prior work [10], sufficient conditions are given for when edges can be removed. In this work, we show that addg outputs or puts or, under certa hypotheses, addg or removg a leak preserves identifiability. These results, as well as additional new contributions, are summarized Tables 1 and 2 below. The question of when a submodel of an identifiable model is identifiable has both theoretical significance and real-life applications (e.g., identifiability of nested models epidemiology [16]). On the theory side, the vestigation of submodels of identifiable models was itiated by Vajda and others [17, 18] with the goal of reducg the problem of identifiability to simpler and more tractable computations. Sce then, it has remaed an terestg problem, as it addresses one of the most important questions regardg lear compartmental models: Which models are identifiable? In applications, deletg an put or output corresponds to changg the experimental setup or changg the design of a biological circuit with a cell. Hence, if we know the correspondg submodel remas identifiable, the new setup will not affect the ability to recover parameters. Date: August 1, 2018. 1

2 GROSS, HARRINGTON, MESHKAT, AND SHIU As for edges and leaks, deletg these elements is a way to model a biological tervention or knockout. Also, practice, the precise model architecture may be unknown, so beg able to vestigate identifiability of a few candidate models at once is desirable. Another motivation for our work is its application to chemical reaction networks. Lear compartmental models correspond to monomolecular chemical reaction networks, and we will use the results here to analyze, subsequent work [9], when identifiability is preserved when networks are joed or decomposed. For lear compartmental models, the problem of determg (generic local) identifiability can be translated, through standard differential-algebra techniques, to the question of whether the Jacobian matrix of the coefficient map (arisg from some put-output equations) is generically full rank. A formula for the put-output equations for models with at least one put was given by Meshkat, Sullivant, and Eisenberg [14]. Here we clarify their formula (see Proposition 2.3 and Remark 2.7), and then use that result to expla how put-output equations can be read off from the put-output equations of submodels arisg from what we call output-reachable subgraphs (Theorem 3.8). Operation Operation preserves identifiability? Add put Yes* (Proposition 4.1) Add output Yes* (Proposition 4.1) Add leak Not always (See Example 5.1); Yes, under certa hypotheses (Theorem 4.3) Add edge Not always (See Example 5.3) Delete put Not always (See Example 5.4) Delete output Not always (See Example 5.5) Delete leak Open (See Question 5.2); Yes, under certa hypotheses (Proposition 4.6) Delete edge Not always (See Example 5.3); Yes, under certa hypotheses [10, Theorem 3.1] Table 1. Operations on lear compartmental models, and whether they preserve identifiability. Here, * pertas to models with at least one put. Operation Operation preserves unidentifiability? Delete put Yes* (Proposition 4.1) Delete output Yes* (Proposition 4.1) Delete leak Not always (See Example 5.1); Yes, under certa hypotheses (Theorem 4.3) Delete edge Not always (See Example 5.3) Add put Not always (See Example 5.4) Add output Not always (See Example 5.5) Add leak Open (See Question 5.2); Yes, under certa hypotheses (Proposition 4.6) Add edge Not always (See Example 5.3) Table 2. Operations on lear compartmental models, and whether they preserve unidentifiability. Here, * pertas to models with at least one put. We apply our results on put-output equations to vestigate whether identifiability is preserved when a leak is added (Theorem 4.3) or removed (Proposition 4.6). The effect on identifiability of addg or deletg a component is not always predictable, as seen Tables 1 and 2. The rows of these two tables are similar, because, for stance, losg identifiabilty when a leak is added can also be viewed as gag identifiability when the leak is removed.

IDENTIFIABILITY OF LINEAR COMPARTMENTAL MODELS 3 Remark 1.1. Tables 1 and 2 are nearly identical, except that, the last le, [10, Theorem 3.1] does not apply to unidentifiable models: that theorem makes use of the sgular-locus equation which is not defed for unidentifiable models. The outle of our work is as follows. In Section 2, we troduce lear compartmental models and defe identifiability. In Section 3, we prove our results on put-output equations. In Section 4, we prove results on how identifiability is affected by addg or removg puts, outputs, or leaks; and then we give some related examples Section 5. We conclude with a discussion Section 6. 2. Background In this section, we recall lear compartmental models, their put-output equations, and the concept of identifiability. We follow closely the notation [10]. A lear compartmental model is defed by a directed graph G = (V, E) and three sets In, Out, Leak V. Each vertex i V is a compartment the model, while each edge j i represents the flow of material from the j-th compartment to the i-th compartment. The sets In, Out, and Leak are the sets of put, output, and leak compartments, respectively. Each compartment i In has an external put u i (t) drivg the system, whereas each compartment j Out is measurable. A compartment k Leak has some constant rate of flow that leaves the system (i.e., does not flow to any other compartment). We assume that Out is nonempty (as, otherwise, no variables can be observed and hence the model parameters cannot be recovered). Consistent with the literature, we dicate output compartments by this symbol:. Input compartments are labeled by, and leaks are dicated by outgog edges. For stance, the lear compartmental model Figure 1 has In = Out = {1, 3} and Leak = {1, 4}. a 01 a 12 a 34 1 2 a 32 3 a 43 4 a 04 Figure 1. A lear compartmental model. To every edge j i G, we associate a parameter a ij, the flow rate from compartment-j to compartment-i. To every leak node i Leak, we associate a parameter a 0i, the flow rate from compartment-i leavg the system. Lettg n = V, the compartmental matrix of a lear compartmental model (G, In, Out, Leak) is the n n matrix A with entries as follows: a 0i k:i k E a ki if i = j and i Leak A ij := k:i k E a ki if i = j and i / Leak a ij if j i is an edge of G 0 otherwise.

4 GROSS, HARRINGTON, MESHKAT, AND SHIU A lear compartmental model (G, In, Out, Leak) defes a system of lear ODEs (with puts u i (t)) and outputs y i (t) given by: (1) where u i (t) 0 for i / In. x (t) = Ax(t) + u(t) y i (t) = x i (t) for i Out, Example 2.1. For the model Figure 1, the ODEs (1) are given by: x 1 a 01 a 12 0 0 x 1 u 1 (2) x 2 x = a 12 a 32 0 0 x 2 3 0 a 32 a 43 a 34 x 3 + 0 u 3, 0 0 a 43 a 04 a 34 x 4 0 x 4 with output equations y 1 = x 1 and y 3 = x 3. Defition 2.2. A directed graph G is strongly connected if there exists a directed path from each vertex to every other vertex. A strong component of a directed graph G is a strongly connected, duced subgraph of G that is maximal with respect to clusion. A lear compartmental model (G, In, Out, Leak) is strongly connected if G is strongly connected. 2.1. Input-output equations. An put-output equation of a lear compartmental model is an equation that holds along every solution of the ODEs (1) and volves only the parameters a ij, put variables u i, output variables y i, and their derivatives. A general form of these equations is given the followg result, which is largely due to Meshkat, Sullivant, and Eisenberg [14, Theorem 2] (our contribution is explaed Remark 2.7): Proposition 2.3. Let M = (G, In, Out, Leak) be a lear compartmental model with n compartments and at least one put. Defe I to be the n n matrix which every diagonal entry is the differential operator d/dt and every off-diagonal entry is 0. Let A be the compartmental matrix, and let ( I A) ji denote the (n 1) (n 1) matrix obtaed from ( I A) by removg row j and column i. Then, the followg equations are put-output equations for M: (3) det( I A)y i = ( 1) i+j j In det ( I A) ji u j for i Out. We will refer to the put-output equations (3) as the put-output equations. (See, for stance, the examples below.) Remark 2.4. By convention, the determant of the empty matrix equals 1, so when n = 1, we have det ( I A) 11 := 1 the put-output equation (3). Remark 2.5. Proposition 2.3 requires that M have at least one put. This hypothesis, not stated [14, Theorem 2], is required the proof. Namely, Cramer s rule is applied to the system ( I A)x = u, which requires that the put vector u be a nonzero vector (i.e., at least one put). Remark 2.6 (Input-output equation for models with no puts and one output). In spite of Remark 2.5, the put-output equation (3) generalizes to models with no puts and a sgle output y i. Namely, the put-output equation this case is just det( I A)y i = 0, due to the fact that the ODE system is ( I A)x = 0 and thus has a nonzero solution x if and only

IDENTIFIABILITY OF LINEAR COMPARTMENTAL MODELS 5 if det( I A) = 0. This result is seen easily from the transfer function approach to fdg put-output equations, which is detailed [5]. Remark 2.7 (Our contribution to Proposition 2.3). In [14, Theorem 2], it is claimed that an put-output equation can be obtaed by dividg both sides of the put-output equation (3) by the greatest common divisor (GCD) among the differential polynomials det( I A) and the det ( I A) ji s (for those j In). The resultg equation, however, is not always an put-output equation (see Example 2.10). Our contribution, therefore, is to correct the put-output equation stated [14, Theorem 2]. Remark 2.7 motivates the followg defition and subsequent question. Defition 2.8. Let M = (G, In, Out, Leak) be a lear compartmental model with compartmental matrix A. Let i Out. The put-output GCD of y i with respect to M is the GCD of the followg set of nonzero differential polynomials: ( ) {det( I A)} {det ( I A) ji j In} \ {0}. Question 2.9. Consider the equation obtaed by dividg both sides of the put-output equation (3) by the put-output GCD of y i. For which models is this equation an putoutput equation? Models for which the put-output GCD is 1 automatically satisfy Question 2.9, so a partial answer to this question was given by Meshkat, Sullivant, and Eisenberg, who showed that the GCD is 1 for strongly connected models with at least one leak and at least one put [14, Corollary 1]. One of our goals here is to generalize that result by removg the requirement of havg leaks (Proposition 3.17 the next section). Moreover, even for models that are not strongly connected, we make progress toward Question 2.9 by showg that the downstream components correspond to factors of the GCD (Theorem 3.8). Example 2.10. Consider the followg model M for which In = Out = {2}: Followg Proposition 2.3, the put-output equation of M is given by: (4) det( I A)y 2 = ( 1) 2+2 det ( I A) 22 u 2, where ( ) d/dt + a21 0 I A =. d/dt Therefore, the put-output equation (4) is (5) d/dt(d/dt + )y 2 = (d/dt + )u 2, that is, y 2 + y = u 2 + u 2. If we divide both sides of the put-output equation (5) by the GCD of d/dt(d/dt+ ) and (d/dt + ), which is (d/dt + ), we obta y 2 = u 2. This equation is not an put-output equation of M, because the ODEs of M satisfy y 2 = x 1 + u 2 u 2.

6 GROSS, HARRINGTON, MESHKAT, AND SHIU Example 2.11 (Example 2.1, contued). Returng to the model Figure 1, with In = Out = {1, 3}, we compute the put-output equation volvg y 1 usg Proposition 2.3. We will see that, this example, dividg by the put-output GCD does give an put-output equation even though this need not be the case general (recall Remark 2.7). First, for i = 1, the put-output equation (3) is (6) det( I A)y 1 = ( 1) 1+1 det ( I A) 11 u 1 + ( 1) 1+3 det ( I A) 31 u 3. Examg the compartmental matrix A, equation (2), we see that the 3 3 matrix ( I A) 31, the upper-right 2 2 submatrix is the zero-matrix. So, det ( I A) 31 = 0. Next, ( I A) and ( I A) 11 are block lower-triangular with lower block as follows: ( ) + a43 a B := 34. a 43 + a 04 + a 34 Moreover, it is straightforward to check that the GCD of det ( I A) and det ( I A) 11 is g 1 = det B. Dividg both sides of the put-output equation (6) yields the followg equation: (7) y 1 + (a 01 + + a 12 + a 32 )y 1 + (a 01 a 12 + a 01 a 32 + a 32 )y 1 = u 1 + (a 12 + a 32 )u 1, and we can verify, e.g., usg the software DAISY [3], that (7) is an put-output equation for the model. Later, we will see that we can also obta this equation via Theorem 3.8, by restrictg the model to those compartments that flow to compartment-1. For the other output, i = 3, the put-output equation (3) is det( I A)y 3 = ( 1) 3+1 det ( I A) 13 u 1 + ( 1) 3+3 det ( I A) 33 u 3, which simplifies to the followg put-output equation: (8) y (4) 3 (a 01 + + a 12 + a 32 + a 43 + a 04 + a 34 )y (3) 3 + (a 01 a 12 + a 01 a 32 + a 32 + a 04 a 43 + (a 01 + + a 12 + a 32 )(a 43 + a 04 + a 34 ))y 3 + ((a 01 a 12 + a 01 a 32 + a 32 )(a 43 + a 04 + a 34 ) + a 04 a 43 (a 01 + + a 12 + a 32 )y 3 + (a 01 a 12 + a 01 a 32 + a 32 )(a 04 a 43 )y 3 = ( a 32 )u 1 + a 32 (a 04 + a 34 )u 1 + u (3) 3 + (a 01 + + a 12 + a 32 + a 04 + a 34 )u 3 + (a 01 a 12 + a 01 a 32 + a 32 + (a 04 + a 34 )(a 01 + + a 12 + a 32 ))u 3 + (a 01 a 12 + a 01 a 32 + a 32 )(a 04 + a 34 )u 3. 2.2. Identifiability. A lear compartmental model is generically structurally identifiable if from a generic choice of the puts and itial conditions, the parameters of the model can be recovered from exact measurements of both the puts and the outputs [2, 11]. We now defe this concept precisely. Defition 2.12. Let M = (G, In, Out, Leak) be a lear compartmental model. The coefficient map is the function c : R E + Leak R k that is the vector of all non-monic coefficient functions of every of differential term of the form y (j) i and u (j) i the put-output equations (3) (here k is the total number of non-monic coefficients). Then M is: (1) globally identifiable if c is one-to-one, and is generically globally identifiable if c is one-to-one outside a set of measure zero.

IDENTIFIABILITY OF LINEAR COMPARTMENTAL MODELS 7 (2) locally identifiable if around every pot R E + Leak there is an open neighborhood U such that c : U R k is one-to-one, and is generically locally identifiable if, outside a set of measure zero, every pot R E + Leak has such an open neighborhood U. (3) unidentifiable if M is not generically locally identifiable. Remark 2.13. While we defe the coefficient map and, consequently, identifiability, terms of the put-output equations (3), we could have chosen any m = Out algebraically dependent put-output equations. In essence, identifiability (Defition 2.12) does not depend on the choice of which malgebraically dependent put-output equations are used to defe the coefficient map (see [12, 15]). Example 2.14 (Example 2.10, contued). Recall that the model M Example 2.10 has put-output equation y 2 + y = u 2 + u 2, so M is globally identifiable. The followg result is [14, Proposition 2]: Proposition 2.15 (Meshkat, Sullivant, and Eisenberg). A lear compartmental model (G, In, Out, Leak) is generically locally identifiable if and only if the Jacobian matrix of its coefficient map c, when evaluated at a generic pot, has rank equal to E + Leak. In the next section, we prove new results on the form of the put-output equations and also clarify some of the subtleties that arise when fdg the put-output equations (such as the GCD mentioned above). Then Section 4, we prove new results on when addg or removg leaks preserves identifiability. Fally, Section 5, we demonstrate some examples of the effects of addg or removg puts, outputs, leaks, or edges. 3. Results on put-output equations and identifiability In this section, we prove results that relate put-output equations of a model to those of certa submodels (Section 3.1) and then vestigate how put-output GCDs (from Defition 2.8) are related to the model s strong components (Section 3.2). 3.1. Input-output equations and submodels. Our ma result, Theorem 3.8, implies that, under certa hypotheses, the put-output equation volvg an output variable y i corresponds to the put-output equation arisg from y i s output-reachable subgraph (see Defition 3.4). We must first expla how to restrict a model M to such a subgraph. Defition 3.1. For a lear compartmental model M = (G, In, Out, Leak), let H = (V H, E H ) be an duced subgraph of G that contas at least one output. The restriction of M to H, denoted by M H, is obtaed from M by removg all comg edges to H, retag all leaks and outgog edges (which become leaks), and retag all puts and outputs H; that is, M H := (H, In H, Out H, Leak H ), where the put and output sets are In H := In V H and Out H := Out V H, and the leak set is Leak H := (Leak V H ) {i V H (i, j) E(G) for some j / V H }.

8 GROSS, HARRINGTON, MESHKAT, AND SHIU Additionally, the labels of edges H are herited from those of G, and labels of leaks are as follows: { label of leak from k th a 0k + {j / V compartment = H (k,j) E(G)} a jk if k Leak V H {j / V H (k,j) E(G)} a jk if k / Leak V H. Remark 3.2. A restriction M H is a lear compartmental model, together with leak-labels which may be sums of parameters. So, M H may have more than E H + Leak H parameters. Example 3.3 (Example 2.11, contued). Returng to the lear compartmental model M from Figure 1, the restriction to the strong component contag compartment-1 and the strong component contag compartment-3 are, respectively, as follows: a 01 a 12 a 34 1 2 a 32 3 a 43 4 a 04 Consider aga the put-output equations of M for the two outputs, y 1 and y 3, which were given equations (7) and (8), respectively. Equation (7) is precisely the put-output equation for the model of the restriction above on the left, while equation (8) volves parameters the full model and does not arise from the restriction on the right. The reason for this difference, explaed below Theorem 3.8, is that the model on the left is upstream of the model on the right, but not vice-versa. Defition 3.4. For a lear compartmental model M = (G, In, Out, Leak), let i Out. The output-reachable subgraph to i (or to y i ) is the duced subgraph of G contag all vertices j for which there is a directed path G from j to i. Example 3.5. Returng to the model Figure 1, the output-reachable subgraph to y 1 is duced by the vertices 1 and 2, so the resultg model of the restriction is the one depicted on the left-hand side Example 3.3. On the other hand, the output-reachable subgraph to y 3 is duced by all 4 vertices; so, the model of the restriction is the origal model Figure 1. Remark 3.6 (Output-reachable subgraphs and structural observability). A lear compartmental model is output connectable [7] if every compartment has a directed path leadg from it to an output compartment. In control theory, a lear compartmental model is structurally observable if every state variable x i (t) can be determed from the puts u j (t) and the outputs y k (t) some fite (but unspecified) time [5]. A lear compartmental model is structurally observable if and only if it is output connectable [7]. Thus output-reachable subgraphs are structurally observable. The model from Figure 1 is thus structurally observable. The followg lemma states that an put-output equation of a model of a restriction M H is an put-output equation for the full model M as long as there are no edges from outside of H to H. Lemma 3.7. For a lear compartmental model M = (G, In, Out, Leak), let H = (V H, E H ) be an duced subgraph of G such that there is no directed edge (i, j) G with i / V H and j V H. Then every put-output equation of M H is an put-output equation of M.

IDENTIFIABILITY OF LINEAR COMPARTMENTAL MODELS 9 Proof. There are no directed edges from outside of H to H, so the ODEs of M are obtaed from those of M H simply by appendg the ODEs for each state variable x i (t) with i / V H (this follows from how restrictions are constructed Defition 3.1). Accordgly, any putoutput equation of M H, that is, any equation volvg only the put and output variables (and their derivatives) and parameters M H that hold along solutions to the ODEs of M H, also holds along solutions to the ODEs of M and therefore is also an put-output equation of M. The ma result of this section, Theorem 3.8, states that to obta an put-output equation volvg an output variable y i, it suffices to consider the correspondg restriction M H arisg from the output-reachable subgraph H to y i. Theorem 3.8. Let M = (G, In, Out, Leak) be a lear compartmental model. Let i Out, and assume that there exists a directed path from some put compartment to compartment-i. Let H denote the output-reachable subgraph to y i, and let A H denote the compartmental matrix for the restriction M H. Then the followg is an put-output equation for M volvg y i : (9) det( I A H )y i = ( 1) i+j det ( I A H ) ji u j, j In V H where ( I A H ) ji denotes the matrix obtaed from ( I A H ) by removg the row correspondg to compartment-j and the column correspondg to compartment-i. Thus, this put-output equation (9) volves only the output-reachable subgraph to y i. Proof. The set of put compartments of M H is In V H (Defition 3.1), and now this result follows directly from Lemma 3.7 and Proposition 2.3. Remark 3.9 (Output-reachable subgraphs and algebraic observability). A model (lear or nonlear) is algebraically observable if every state variable can be recovered from observation of the put and output alone [12], a more general notion of the control theory concept of structural observability. A model is algebraically observable if and only if the sum of the differential orders of all put-output equations from a reduced set 1 is equal to the number of state variables, n [6, Proposition 5]. Theorem 3.8 verifies that, for the case of a sgle output y i, the put-output equation (9) for the restriction to the output-reachable subgraph H to y i has differential order n = V (H), and thus the restriction is algebraically observable. Remark 3.10. In two of our results, Proposition 2.3 and Theorem 3.8, each put-output equation volves only a sgle output. If stead we fd the put-output equations usg an alternative approach, such as usg the characteristic set as DAISY [3], then some of the put-output equations may volve several outputs. While this does not change the overall identifiability results (see Remark 2.13), this can change the coefficients and thus possibly change the identifiable functions of parameters. For more on identifiable functions of parameters, see [12]. Example 3.11 (Example 2.1, contued). Returng to the model Figure 1, with In = Out = {1, 3}, we compute the put-output equations (9). First, for i = 1, we beg with: (10) det( I A H1 )y 1 = ( 1) 1+1 det ( I A H1 ) 11 u 1, 1 Here the particular reduced set of put-output equations of terest is the characteristic set, a triangular system that generates the same dynamics as the origal system [6].

10 GROSS, HARRINGTON, MESHKAT, AND SHIU where A H1 comes from the restriction to the output-reachable subgraph to y 1 : ( ) a01 a A H1 := 21 a 12. a 12 a 32 The resultg put-output equation (10) is exactly the one displayed earlier (7). For the other output, i = 3, the output-reachable subgraph to y 3 is the full graph, and so the put-output equation (9) is the one given earlier (8). Theorem 3.8 allows us to prove two results on identifiability (Corollaries 3.12 and 3.15). Corollary 3.12. Let M = (G, In, Out, Leak) be a lear compartmental model with at least one put such that there is a compartment j such that (i) j is not any output-reachable subgraph of G and (ii) j is a leak compartment or there is a directed edge (j, k) out of j. Then M is unidentifiable. Proof. Let m = Out. Sce identifiability (Defition 2.12) does not depend on the choice of which m algebraically dependent put-output equations used to defe the coefficient map (see Remark 2.13), we can choose as our put-output equations those appearg (9). These equations, by Theorem 3.8, do not volve the leak parameter a 0j (if j is a leak compartment) or the edge parameter a kj (if (j, k) is an edge). Hence, M is unidentifiable. Example 3.13. In the followg model M, compartment-2 is a leak compartment that is not the output-reachable subgraph to the (unique) output compartment-1: a 01 a 02 Hence, by Corollary 3.12, M is unidentifiable. Indeed, the output-reachable subgraph is duced by compartment-1, so by Theorem 3.8, yields the put-output equation y 1 + (a 01 + )y 1 = 0, which does not volve the leak parameter a 02. The next result states that the observable component submodel (Defition 3.14) of an identifiable model is always identifiable. Defition 3.14. For a lear compartmental model M, the observable component is the union of all output-reachable subgraphs to all outputs y i the model. Corollary 3.15. Let M be a lear compartmental model with at least one put. Let H denote the observable component. If M is generically globally (respectively, locally) identifiable, then so is M H. Proof. Assume M = (G, In, Out, Leak) is identifiable. Let H be the observable component, that is, the union of all output-reachable subgraphs to outputs y i (for i Out). As the proof of Corollary 3.12, we can choose our put-output equations of M to be those arisg from each output-reachable subgraph, as (9). By construction, these equations hold along solutions to the ODEs of M, and volve only the variables (and their derivatives) and parameters M H and thus also hold along solutions of M H (recall from the proof of Lemma 3.7 that the ODEs of M are obtaed from those of M H by appendg the ODEs

IDENTIFIABILITY OF LINEAR COMPARTMENTAL MODELS 11 for state variables x i (t) for all i / V H ). These equations are therefore put-output equations for M H, and so M H is identifiable. 3.2. Input-output GCDs. We now return to Question 2.9, which concerns put-output GCDs (Defition 2.8). Recall that [14, Theorem 2], each put-output equation was divided by the correspondg GCD, and also this GCD was proven to be 1 the case of strongly connected models with at least one leak [14, Corollary 1]. This motivates the followg question: Question 3.16. For which models is every put-output GCD equal to 1? In this subsection, we strengthen [14, Corollary 1] to allow for strongly connected models without leaks (Proposition 3.17). Then we elucidate some of the factors of the put-output GCD, and thereby make progress toward fdg the full form of this GCD (Proposition 3.20). As a consequence, we fd a necessary condition for the GCD to be 1 (Corollary 3.21). Proposition 3.17 (Input-output GCDs for strongly connected models). For a strongly connected lear compartmental model with at least one put, every put-output GCD is 1. Proof. If M has at least 1 leak, this result is [14, Corollary 1]. Assume that M has no leaks. We consider first the case of n = 1 compartment, with an put and output at that compartment. There are no leaks or edges, and hence no parameters. The put-output equation from Proposition 2.3 is y 1 = u 1, so the put-output GCD is 1. Now assume that n 2 (and M has no leaks). Let A be the compartmental matrix. By defition, the put-output GCD of an output variable y i, denoted by g i, is the GCD among the polynomials det( I A) and the det ( I A) ji s for j In. Our goal is to prove that g i = 1 for all i Out. For ease of notation, we consider stead the characteristic polynomial det (λi A) and the related polynomials det (λi A) ji for j In. We must show that their GCD is 1. By hypothesis, M is strongly connected and has no leaks, so A is the negative of the Laplacian matrix of a strongly connected directed graph G = (V, E). So, by followg the argument [14, Proof of Theorem 3], a factorization of det(λi A) to two irreducible polynomials is given by λ (det(λi A)/λ). (Here, the n 2 assumption is used.) Therefore, we need only show that neither (i) λ nor (ii) det(λi A)/λ divides any of the det (λi A) ji s. For (i), we must show that det (λi A) ji λ=0 is a nonzero polynomial. Indeed, det (λi A) ji λ=0 = det( A) ji, which equals (up to sign) the (i, j)-th cofactor of the Laplacian matrix of the strongly connected graph G. This determant, after settg every edge parameter a kl equal to 1, is precisely (by the Matrix-Tree Theorem) the number of directed spanng trees of G rooted at i, and this number is at least 1 (because G is strongly connected). So, det (λi A) ji λ=0 is deed nonzero. Now we consider (ii). Viewg det(λi A)/λ as a univariate polynomial λ, its leadg term is λ n 1. As for det (λi A) ji, we first assume that i j. Then the matrix (λi A) ji contas only n 2 λ s its entries (both the i-th and j-th diagonal entries of (λi A) were removed). So, this case, det (λi A) ji has degree less than n 1 and therefore can not be divisible by the degree-(n 1) polynomial det(λi A)/λ.

12 GROSS, HARRINGTON, MESHKAT, AND SHIU In the remag case, when i = j, it is straightforward to check that the coefficient of λ n 2 det(λi A)/λ equals the followg sum over all edges the graph G: (11) a lk. (k,l) E Similarly, the coefficient of λ n 2 det (λi A) ji (when i = j) equals the sub-sum, over edges that do not origate at i, namely, (k,l) E, k i a lk. This sum is a strict sub-sum of the sum (11), as G is strongly connected. So, as desired, det(λi A)/λ does not divide det (λi A) ji. We next consider the case when M need not be strongly connected. The next result shows that the put-output GCD of y i is a multiple of certa upstream, neutral, and downstream components that do not conta an put leadg to the compartment-i. Defition 3.18. Let M = (G, In, Out, Leak) be a lear compartmental model. Let i Out be such that there exists a directed path G from some j In to i. The set of compartments lyg along such paths duces the put-output-reachable subgraph to i (or to y i ); more precisely, this is the duced subgraph of G with vertex set contag i, every j In such that there is a directed path from j to i, and every compartment passed through by at least one such path. Remark 3.19 (Input-output-reachable subgraphs and structural controllability/observability). A lear compartmental model is put connectable [7] if every compartment has a path leadg to it origatg from an put compartment. In control theory, a lear compartmental model is structurally (completely) controllable if an put can be found that transfers the state variables x i (t) from any itial state to any specified fal state fite time [5]. A trap is a strongly connected set of compartments from which no paths exists to any compartment outside the trap, cludg the environment (i.e. leaks) [7]. A lear compartmental model is structurally controllable if and only if it is put connectable and it is possible to fd disjot paths startg at put compartments and such that every trap has a compartment at the end of one such path [7]. This means our notion of an put-output-reachable subgraph to y i corresponds to a structurally observable and structurally controllable model ( our case, there is either a sgle trap correspondg to the strongly connected component contag y i or no traps if that strongly connected component contas a leak). Note that this conclusion is stronger than our conclusion Remark 3.6, where the output-reachable subgraph to y i is structurally observable. Beg both structurally observable and structurally controllable means, from [7], that no model with a smaller number of compartments can be found that will fit the put-output data duced by this subgraph. In addition, no model with a larger number of compartments can be both structurally controllable and structurally observable, by defition. Thus, our notion of an put-output-reachable subgraph to y i has the desirable quality from control theory of beg the maximal subgraph correspondg to a model that is both structurally controllable and structurally observable. Proposition 3.20. Let M = (G, In, Out, Leak) be a lear compartmental model. Let i Out be such that there exists a directed path G from some j In to i. Let H denote the put-output-reachable subgraph to i. Let H c denote the subgraph duced by all compartments of G that are not H. Then the put-output GCD of y i is a multiple of det( I A H c ).

IDENTIFIABILITY OF LINEAR COMPARTMENTAL MODELS 13 Proof. If G is strongly connected (that is, G = H), then the result holds trivially. Assume G is not strongly connected. Consider all strong components C of G that are not H and also are upstream of H, that is, there exists a directed path G from C to H. Let U be the subgraph of G duced by all such strong components. Next, let D denote the subgraph of G duced by all strong components of G that are not H nor U. (So, D cludes downstream components of H.) By construction, the put-output reachable subgraph H is a union of strong components of G. So, the vertices of U, D, and H partition the vertices of G. We claim that there are no directed edges (i) from a compartment H to one U, (ii) from a compartment D to one H, nor (iii) from a compartment D to one U. For (i) and (ii), this claim follows from the defition of strong component and by construction of H, U, and D. For (iii), such an edge would violate the defition of U: there would be a strong component D that should have been U. Reorder the compartments so that those U come first, and then those H, and fally those D. The lack of edges from H to U, from D to H, and from D to U, yields the followg block lower-triangular form for the compartmental matrix for M: [ ] AU [ 0 ] 0 (12) A = AH [ 0 ]. AD By construction, the diagonal blocks A U, A H, and A D are the compartmental matrices of the restriction of M to, respectively, U, H, and D. By construction, the compartmental matrix for H c has the followg form: ( [ ] ) AU 0 A c H = [ ]. AD So, (13) det( I A H c ) = det( I AU ) det( I A D ). Similarly, the followg equality follows from (12): (14) det( I A) = det( I A U ) det( I A H ) det( I A D ). Recall that the put-output GCD of y i is the GCD of the determants det( I A) and the det( I A) ji s (for j In). So, to show that this GCD is a multiple of det( I A H c ), usg equations (13) and (14), we need only show the followg claims for every j In: (i) if j is not a compartment H, then det( I A) ji = 0. (ii) if j is H, then det( I A) ji = det( I A U ) det( I A H ) ji det( I A D ). To prove claim (i), assume that j In is not H. We know that j is D, because compartments U are not put compartments (otherwise they would be H). Removg column-i from the block lower-triangular matrix I A (one of the columns volvg the ( I A H )-block) drops the rank by 1, and then removg row-j (one of the rows volvg the ( I A D )-block) drops the rank aga by 1. So, det( I A) ji = 0.

14 GROSS, HARRINGTON, MESHKAT, AND SHIU For claim (ii), assume that j In is H. In this case, both row-j and column-i of I A volve the block A H, so the submatrix ( I A) ji is also block lower-triangular, with upperleft block equal to ( I A U ) and lower-right block ( I A D ). Therefore, det( I A) ji factors as claimed (ii). The followg result follows directly from Proposition 3.20: Corollary 3.21. Consider a lear compartmental model M = (G, In, Out, Leak) with at least one put. If the put-output GCD of an output variable y i is 1, then the put-outputreachable component to y i is the full graph G. 4. Identifiability results on addg or removg puts, outputs, or leaks In this section, we show that identifiability is preserved when puts or outputs are added to a model (Proposition 4.1), or, under certa hypotheses, when a leak is added (Theorem 4.3) or removed (Proposition 4.6). For a result on which edges can be safely removed without losg identifiability, we refer the reader to [10, Theorem 3.1]. Our first result is a proof of the widely accepted fact that, if a model is identifiable, addg puts or outputs preserves identifiability. We clude its proof for completeness, as we could not fd a formal proof the literature. Proposition 4.1 (Addg puts or outputs). Let M = (G, In, Out, Leak) be a lear compartmental model that has at least one put, and let M be a model obtaed from M by addg an put or an output ( other words, In or Out is enlarged by one compartment). If M is generically globally (respectively, locally) identifiable, then so is M. Proof. Addg an output yields a new put-output equation, while addg an put adds coefficients to existg put-output equations (by Proposition 2.3). Either case extends the coefficient map c : R E + Leak R k to some c = (c, c) : R E + Leak R k+l. Specifically, the case of addg an output, c corresponds to the coefficients of the additional equation from the new output, while the case of addg an put, c corresponds to coefficients of the new put variable u j and its derivatives on the right-hand side of the put-output equations (3). If c is generically one-to-one (respectively, generically fite-to-one) then so is c. Remark 4.2. The converse to Proposition 4.1 is not true general (see Examples 5.4 and 5.5 the next section). In other words, identifiability can be lost by removg puts or outputs. Accordgly, there are mimal sets of outputs for identifiability; these sets were vestigated by Anguelova, Karlsson, and Jirstrand [1]. However, fdg mimal sets of puts for a given output remas an open question. The next results vestigate how addg or removg a leak affects identifiability. Theorem 4.3 (Addg one leak). Let M be a lear compartmental model that is strongly connected and has at least one put and no leaks. Let M be a model obtaed from M by addg one leak. If M is generically locally identifiable, then so is M. Proof. Let M = (H, In, Out, Leak), with H = (V, E), be a strongly connected lear compartmental model with n compartments, at least 1 put, and no leaks. Let M be obtaed from M by addg a leak from one compartment, which we may assume is compartment-1.

IDENTIFIABILITY OF LINEAR COMPARTMENTAL MODELS 15 As (1), we write the ODEs of M and M, respectively, as follows: dx(t) dx(t) = A x(t) + u(t) and = Ã x(t) + u(t). dt dt Here A is the n n compartmental matrix for M, so the column sums are 0 (because A is the negative of the Laplacian matrix of a graph). Also, Ã is obtaed from A by addg a 01 (where a 01 is the new leak parameter) to the (1,1)-entry. (15) By Proposition 2.3 the followg are the put-output equations for M: det( I A)y i = ( 1) i+j j In det ( I A) ji u j for i Out. We know that Out 1, because M is identifiable. As for M, aga by Proposition 2.3, the followg are the put-output equations: det( I Ã)y i = ( 1) ( ) (16) i+j det I Ã u j for i Out. j In Let c : R E R k be the coefficient map for M arisg from the coefficients of the putoutput equations (15), where the first coefficient is chosen to be the coefficient of y i the left-hand side of (15). Notice that this coefficient is dependent of the choice of i Out and, fact, is equal to 0 because M is strongly connected and has no leaks [14]. Similarly, let c : R E +1 R k denote the coefficient map for M comg from the putoutput equations (16), where the coefficients are chosen the same order as for c. The first coefficient, c 1, contrast with the one the previous coefficient map, is not equal to 0. In fact, we claim that this coefficient is as follows: (17) c 1 = a 01 T τ 1 π T, where τ 1 denotes the set of all (directed) spanng trees of H consistg of (n 1) edges, each of which is directed toward node 1 (the root of the tree), and π T denotes the followg monomial Q[a ji (i, j) E]: π T := a ji. (i,j) is an edge of T Indeed, it is straightforward to check equality (17) usg [10, Proposition 4.6]. The sum T τ 1 π T does not volve a 01 (by construction) and is a nonzero polynomial (because H is a strongly connected digraph on n nodes and hence contas at least one (n 1)-edge spanng tree rooted at node 1). Thus, usg (17), we see that c 1 a 01 = T τ 1 π T. c So, 1 a 01 is a nonzero polynomial that does not volve a 01 : ( ) c1 a01 =0 = c 1 (18) = π T is a nonzero polynomial. a 01 T τ 1 (19) a 01 Gog beyond the first coordate, we claim that the full coefficient map has the form: c = c + a 01 v, ji

16 GROSS, HARRINGTON, MESHKAT, AND SHIU for some v (Q[a ji (i, j) E]) k. Indeed, this claim follows from (15) and (16), and from the fact that Ã comes from addg a 01 to one entry of A. Now we consider Jac(c), the Jacobian matrix of c. The first row corresponds to the coefficient that we saw is 0, so this row is a row of 0 s: 0... 0 (20)... Jac(c) =......... When evaluated at a generic pot, Jac(c) has (full) rank equal to E (because M is generically locally identifiable and by Proposition 2.15). So, there exists a choice of E rows of Jac(c), which we dex by i 1,..., i E, so that the resultg E E matrix, denoted by J, is generically full rank. That is, det J is a nonzero polynomial Q[a ji (i, j) E]. By (19), we know Jac( c) is obtaed from Jac(c) by first addg polynomial-multiples of a 01 to some entries, and then appendg a column correspondg to the new parameter, a 01 : (21) Jac( c) = c 1 a 01 Jac(c) + a 01 K. for some k E matrix K with entries Q[a ji (i, j) E]. Let J denote the ( E +1) ( E +1) submatrix of Jac( c) comg from choosg the first row and the rows dexed by i 1,..., i E. Then, by construction and from usg equations (18), (20), and (21), we obta: (22) ( det J ) ( ) a01 =0 = det J a01 =0 As noted earlier, c 1 a 01 0... 0 = det J, c 1 a 01. = ± c 1 det J. a 01 and det J are both nonzero polynomials. So, equation (22) implies that det J also is a nonzero polynomial. So, Jac( c) generically has (full) rank equal to E +1. And thus, by Proposition 2.15, the model M is generically locally identifiable. Remark 4.4. Our proof of Theorem 4.3 is similar to that for [10, Theorem 3.11], which analyzed whether identifiability is preserved when an edge of a lear compartmental model is deleted. Indeed, our results expla why, for the models we considered [10], the leak parameters (labeled a 01 [10]) do not divide the sgular-locus equation (see Proposition 4.6). We conjecture that the converse of Theorem 4.3 holds. Conjecture 4.5 (Deletg one leak). Let M be a lear compartmental model that is strongly connected and has as at least one put and exactly one leak. If M is generically locally identifiable, then so is the model M obtaed from M by removg the leak. The next result resolves one case of Conjecture 4.5.

IDENTIFIABILITY OF LINEAR COMPARTMENTAL MODELS 17 Proposition 4.6. Let M be a lear compartmental model that is strongly connected, and has an put, output, and leak a sgle compartment (and has no other puts, outputs, or leaks). If M is generically locally identifiable, then so is the model obtaed from M by removg the leak. Proof. Assume that M = (G, {1}, {1}, {1}) is a generically locally identifiable, strongly connected lear compartmental model with n compartments and an put, output, and leak compartment-1 (and no other puts, outputs, or leaks). Let M be the model obtaed from M by removg the leak. The put-output equation of M has the followg form (by Proposition 2.3): y (n) 1 + c n 1 y (n 1) 1 + + c 1 y 1 + c 0 y 1 = u (n 1) 1 + d n 2 u (n 2) 1 + + d 1 u 1 + d 0 u 1. The compartmental matrix for M, denoted by A, comes from addg a 01 to the (1, 1)-entry of Ã, the compartmental matrix for M. Therefore, the matrices ( I A) 11 and ( I Ã) 11 are equal. So, by Proposition 2.3, the right-hand side of the put-output equation for M cocides with that for M (and these coefficients common do not volve the leak parameter a 01 ). We therefore write the put-output equation for M as follows: y (n) 1 + c n 1 y (n 1) 1 + + c 1 y 1 + 0 = u (n 1) 1 + d n 2 u (n 2) 1 + + d 1 u 1 + d 0 u 1. The coefficient of y 1 is 0, because this coefficient is the constant term of det( I A), which is ± det A, and this determant is 0 (the column sums of A are zero). The resultg coefficient map for M is: c M := (c n 1,..., c 1, c 0, d n 2,..., d 1, d 0 ) : R E +1 R 2n 1, where E is the edge set of G; and the coefficient map for M is: c M := ( c n 1,..., c 1, d n 2,..., d 1, d 0 ) : R E R 2n 2. From [10, Theorem 4.5], the coefficients c i and d i are sums of products of edge labels, where the sum is taken over all (n i)-edge (respectively, (n i 1)-edge) spanng comg forests the leak-augmented graph of G (respectively, a related graph obtaed by deletg compartment-1). The coefficients c i have a similar terpretation, where now the leak-augmented graph is simply G, as M has no leaks. It is straightforward to check, then, that the followg equalities hold: c n 1 = c n 1 a 01 c n 2 = c n 2 a 01 d n 2 (23). c 1 = c 1 a 01 d 1 0 = c 0 a 01 d 0. We claim that c M, the coefficient map for M, is generically fite-to-one if and only if the followg map is generically fite-to-one: φ := (a 01, c n 1,..., c 1, d n 2,..., d 1, d 0 ) : R E +1 R 2n 1.

18 GROSS, HARRINGTON, MESHKAT, AND SHIU To verify this claim, we first defe the map: ν : R 2n 1 R 2n 1 (a 01, c n 1,..., c 1, d n 2,..., d 0) ( c n 1 + a 01, c n 2 + a 01d n 2,..., c 1 + a 01d 1, a 01d 0, d n 2,..., d 0). By the equations (23), we have that ν φ = c M. Also, it is straightforward to check that Jac ν = ±d 0, which is a nonzero polynomial, and so this Jacobian matrix is generically full rank. Our claim now follows directly. Hence, the Jacobian matrix of φ, displayed below (we order the variables with a 01 first, so it corresponds to the first column), generically has (full) rank=2n 1: 1 0... 0 Jac φ = 0. Jac c. M 0 Therefore, Jac c M, the Jacobian matrix of the coefficient map for M, generically has (full) rank=2n 2. So, by Proposition 2.15, the model M is generically locally identifiable. We now apply Proposition 4.6 to three well-known families of lear compartmental models: catenary (path graph) models, mammillary (star graph) models, and cycle models [8]. See Figures 2 and 3. For these models, addg a leak to compartment-1 yields a model that is generically locally identifiable [4, 10, 13, 14]. So, Proposition 4.6 yields the followg result: Proposition 4.7. These lear compartmental models are generically locally identifiable: (1) the n-compartment catenary (path) model Figure 2 (for n 2), (2) the n-compartment cycle model Figure 3 (for n 3), and (3) the n-compartment mammillary (star) model Figure 3 (for n 2). a 12 a 23 a 34 a n 1,n 3... n a 32 a 43 a n,n 1 Catenary Figure 2. The catenary (path) model with n compartments and no leaks, which compartment-1 has an put and output (cf. [10, Figure 1]). 5. Examples of addg or removg puts, outputs, leaks, or edges This section compiles examples of lear compartmental models that show that when a leak or edge is added or when an put, output, leak, or edge is deleted identifiability is sometimes preserved and sometimes lost (as summarized Table 1). Our examples also show, conversely, that when a leak or edge is deleted or when an put, output, leak, or edge is added unidentifiability is sometimes preserved and sometimes lost (Table 1).

IDENTIFIABILITY OF LINEAR COMPARTMENTAL MODELS 19 n a n,n 1 n 1 a 1n a 43 1 a 1,n a n,1 a 31 a 13. 3 Cycle 2 3 a 32 Mammillary a 12 2 Figure 3. Two models with n compartments and no leaks, where compartment-1 has an put and output (cf. [10, Figure 2]). Left: The cycle. Right: The mammillary (star). In the followg examples, the equations we assert are put-output equations can be checked by hand, usg DAISY [3], or via Proposition 2.3 or Remark 2.6. Example 5.1 (Add or delete a leak). We consider models M and M, where M is obtaed from M by addg a leak (or, equivalently, M is obtaed from M by deletg a leak). By Theorem 4.3, when M is strongly connected and has puts but no leaks, then if M is identifiable, then M is too. Next, let M, M, and M denote the models on the left, middle, and right, respectively: a 01 a 01 a 02 The model M is identifiable (the put-output equation is y 1 + y 1 = 0), while M is not (the put-output equation is y 1 + (a 01 + )y 1 = 0), nor is M (recall Example 3.13). Example 5.1 does not give an example of an unidentifiable model M and an identifiable model M, where M has one more leak than M. We do not know whether such an example exists, so we pose the followg question. Question 5.2. For an unidentifiable lear compartmental model M, if one leak is added, is the resultg model always unidentifiable? Example 5.3 (Add or delete an edge). We consider models M and M, where M comes from addg an edge to M (or, equivalently, M comes from deletg an edge of M). In prior work, we showed that when both models are strongly connected, and the parameter of the deleted edge does not divide the sgular-locus equation of M, then deletg the edge preserves generic local identifiability (see [10, Example 3.2]). On the other hand, the model shown the right-hand side of (24) Example 5.5 below is identifiable [10], but deletg the edge labeled a 12 yields an unidentifiable model.