Distributed l 1 -state-and-fault estimation for Multi-agent systems

1 Distributed l 1 -state-and-fault estimation for Multi-agent systems Kazumune Hashimoto, Michelle Chong and Dimos V. Dimarogonas, Senior Member, IEEE arxiv:181.876v1 [math.oc] 2 Oct 218 Abstract In this paper, we propose a distributed state-andfault estimation scheme for multi-agent systems. The proposed estimator is based on an l 1-norm optimization problem, which is inspired by sparse signal recovery in the field of compressive sampling. Two theoretical results are given to analyze the correctness of the proposed approach. First, we provide a necessary and sufficient condition such that state and fault signals are correctly estimated. The result presents a fundamental limitation of the algorithm, which shows how many faulty nodes are allowed to ensure a correct estimation. Second, we provide a sufficient condition for the estimation error of fault signals when numerical errors of solving the optimization problem are present. An illustrative example is given to validate the effectiveness of the proposed approach. I. INTRODUCTION The problem of controlling multi-agent systems arises in many practical applications, such as vehicle platooning, formation control of autonomous vehicles and satellites, cooporative control of robots, power networks, to name a few [1]. In multiagent systems, designing an automatic framework of detecting anomaly or faults has attracted much attention in recent years, see, e.g., [2]. The motivation is two fold. First, since a multi-agent system consists of many interactions among subsystems, only a few anomalous behaviors will provide a significant impact on the overall system s performance. For example, in vehicle platooning, anomalous behaviors such as sudden braking or acceleration of a leading vehicle will affect behaviors of the following vehicles, which may cause traffic accidents or traffic jam. Second, due to the existence of many interactions over the communication network, a multi-agent system may be vulnerable to cyber attacks. For example, actuators of some nodes may be hacked by attackers over communication network, so that they can operate the controllers to enforce some nodes to be anomalous. Therefore, detecting and estimating faults in multi-agent system at an early stage is necessary to maintain safe and reliable real-time operations. So far, various formulations and approaches have been proposed to detect and estimate faults in multi-agent systems, see, e.g., [3] [1]. Early works focus on designing detection schemes in a centralized manner, in which all nodes should communicate with a central unit for detecting faulty signals, see, e.g., [3] []. In recent years, there has been a growing attention in designing decentralized or distributed fault detection and estimation schemes, see, e.g., [6] [1]. The authors are with the School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, 144 Stockholm, Sweden. This work is supported by the Knut and Alice Wallenberg Foundation. For example, [6] and [7] proposed Unknown Input Observers (UIO) for detecting faults in a decentralized and distributed manner, respectively. In [8], a distributed fault estimation scheme was proposed by using H optimization, which involves sensitivity of faults and robustness of disturbances. In [1], a consensus-based decentralized observer is designed for fault detection. Moreover, a distributed fault detection scheme based on UIO for multi-agent Linear Parameter Varying (LPV) systems is proposed in [14]. In this paper, we are interested in developing a distributed fault detection and estimation strategy for multi-agent systems. In particular, we consider the situation where each node is able to measure relative state information with respect to its neighbors. Some approaches have been already proposed for this problem set-up, see, e.g., [9] [13], where the references include centralized, decentralized, and distributed schemes. For example, in [11] a coordinate transformation is introduced to extract the observable subspace, and a sliding mode decentralized observer is designed to estimate faulty signals. In [13], a distributed fault detection scheme is proposed by employing Linear Matrix Inequalities (LMIs) on the sensitivity of residuals and robustness against system noise. In [1], a fault detection scheme is developed by introducing a setvalued observer, which is applied to two subsystems that are decomposed by left-coprime factorization. In addition, a UIObased event-triggered communication protorol was proposed in [9], where each node monitors the neighboring nodes if they are subject to faults. The contribution of this paper is to present a new distributed fault estimation scheme for multi-agent systems based on relative state measurements. The objective is to provide a distributed estimator (rather than detector), such that each node estimates both states and faults. Thus, this objective differs from some previous works in [9] [11], where only fault signals are detected or estimated. As we will see later, this is achieved by considering the case when a node assigned as a leader is able to measure additional information. The proposed estimator is different from previous works in the literature, and draws inspirations from compressive sampling [16]. The estimator can handle the case when multiple faults arise. Thus, the approach is advantageous over [9] [11] where neither systematic estimation scheme for multiple faults nor theoretical analysis are given. Moreover, we provide a quantitative analysis of how many faulty nodes can be tolerated to provide a correct estimation. While [13] gives a sufficient condition for designing a suitable estimator for multiple faults, our result may be more specific and intuitive, giving us a necessary and sufficient condition when the estimator provides

2 correct estimations. Specifically, the contribution of this paper is as follows: 1) We propose a fault and state estimator for multi-agent systems based on an l 1 -norm optimization problem, which is inspired by sparse signal recovery in the field of compressive sampling [16]. The optimization problem is formulated as a Basis Pursuit [17], in which several numerical solvers can be employed to solve the problem efficiently. 2) We provide two quantitative analysis for the correctness of the proposed estimator. First, we provide a necessary and sufficient condition such that states and faulty signals are correctly estimated for all times. Indeed, this illustrates a fundamental limitation of the proposed algorithm, which shows how many faulty nodes can be tolerated to provide correct estimation. Second, we consider the case when numerical errors of solving the optimization problem are present for some time, and provide error bounds for the state-and-fault estimation. In particular, we show that the estimation error is bounded depending on not only the previous state estimate but also the number of faulty nodes. The result also shows how the estimation errors are propagated according to the time evolution. 3) We present a distributed implementation to solve the l 1 - norm optimization problem. To this aim, we employ the approach presented in [18], which utilizes the alternating direction method of multipliers (ADMM). As with [18], the equivalence between the centralized and the distributed solution is shown. Since the approach employs the l 1 -norm optimization, the analysis provided in this paper may be relevant to secure estimation for Cyber Physical Systems (CPSs), see, e.g., [19] [21]. For example, [19], [21] provide a necessary and sufficient condition to correctly estimate the initial state under adversarial attacks on sensors. They showed that the number of anomalous sensors cannot be larger than half the total number of sensors for providing a correct state estimation. While the above results are relevant to the analysis presented in this paper, a fundamental difference is that they consider only observable systems. That is, we show in this paper that the correct fault and state estimations can be given even when the overall system is unobservable, which is not investigated in the previous analysis. Moreover, the analytical proof is different as we make use of the null-space property of the measurement matrix. The remainder of this paper is organized as follows. In Section II, we present some preliaries on graph theory and the l 1 -norm optimization problem. In Section III, the problem formulation and approach are given. In Section IV, we provide analysis when previous state estimate is error-free. In Section V, we provide the analysis for the correctness of the proposed estimator. In Section VI, a distributed implementation to solve the estimation problem is given. In Section VII, an illustrative example shows the effectiveness of the proposed approach. Conclusions are given in Section VIII. Notation: Let R, R +, N, N + be the non-negative reals, positive reals, non-negative integers, and positive integers, respectively. For a given vector x R n, denote by x (i) the i-th component of x. Let x 1 be the l 1 -norm of x, i.e., x 1 = x (1) + x (2) + + x (n). For a given matrix A R n m, we denote by A (i,j) the (i, j)-component of A. For a given M N +, denote by 1 M R M a M-dimensional vector whose components are 1. For a given set T, denote by T the cardinality of T. For given T {1,..., n} and x R n, x is called T -sparse if all components indexed by the complement of T is, i.e., x (i) =, i {1,..., n}\t. For a given x R n and a set T {1,..., n}, denote by x T R T a vector from x by extracting all components indexed by T. Given A R m n, denote by rank(a) the rank of A. Moreover, denote by ker(a) the null space of A, i.e., ker(a) = {x R n : Ax = }. A. Graph theory II. PRELIMINARIES Let G = (V, E) denote a directed graph, where V = {1, 2,..., M} is the set of nodes and E = {e 1, e 2,..., e N } V V is the set of edges. The graph is strongly connected if for every pair of nodes there exists a path between them. The graph is weakly connected if for every pair of nodes, when removing all orientations in the graph, there exists a path between them. For a given i V, let N i V be the set of neighboring nodes of i, i.e., N i = {j V : e = {i, j} E}. The incidence matrix D = D(G) is the {, ±1} matrix, where D (i,j) = 1 if the node i is the head of the edge e j, D (i,j) = 1 if the node i is the tail of the edge e j, and D (i,j) = otherwise. For a given G = (V, E) with M number of nodes, it is shown that rank(d) = M 1 if G is weakly connected. The nullspace of the incidence matrix is given by ker(d T ) = γ1 M, where γ R (see, e.g., Chapter 2 in [22]). B. The Null-Space Property and l 1 -norm optimization In what follows, we provide the notion of Null-Space Property (NSP) and its useful result for the l 1 -norm optimization problem. Definition 1 (The Null-Space Property). For given A R m n and T {1, 2,..., n}, A is said to satisfy the Null-Space Property (NSP) for T (or T -NSP for short), if for every v ker(a)\{}, it holds that where T c = {1,..., n}\t. v T 1 < v T c 1, (1) The NSP is a key property to check whether the sparse signal can be reconstructed based on the l 1 -norm optimization problem: Theorem 1 (l 1 -reconstruction theorem). For given A R m n and T {1,..., n}, every T -sparse vector x R n is a unique solution to the following optimization problem: x x 1 s.t., Ax = Ax, (2) if and only if A satisfies T -NSP.

3 The above theorem indicates that every T -sparse vector can be reconstructed by solving the l 1 -norm optimization problem in (2), if and only if the matrix A saisfies T -NSP. The proof of Theorem 1 follows the same line as [23], [24] and is given in the Appendix. A. Dynamics III. PROBLEM FORMULATION We consider a network of M inter-connected nodes, which is modeled by a graph G = (V, E), where V = {1, 2,..., M} is the set of nodes and E V V is the set of edges. Here, each edge e = {i, j} E indicates the sensing and communication capabilities of the node i with respect to j. More specifically, if {i, j} V node i is able to measure relative state information with respect to j, as well as to transmit it to j. We assume that the network is weakly connected. Moreover, we assume that there exists one leader and M 1 followers; without loss of generality, the node labeled with 1 V indicates the leader and the others are the followers. The dynamics for node i V is described by the following discrete-time system: x i (k) = A i x i (k 1) + B i u i (k 1) + f i (k), (3) for k N, where x i R n is the state, u i R m is the control input, and f i R n is the signal indicating the occurrence of faults, i.e., f i (k) when node i is subject to a fault at k, and f i (k) = otherwise. The faulty signals can be viewed as system faults on dynamics in (3), or actuator faults caused by physical effects or cyber attacks over the communication network. Note that the state and input dimensions are n and m for all nodes. We assume that the control input is known to node i for all k N. Let I k {1,..., M} be the unknown set of nodes that are subject to faults at k N, i.e., I k = {i {1,..., M} : f i (k) }. (4) Note that the set I k can be time-varying, i.e., the set of faulty nodes is possibly changing according to the time evolution. The output equation depends on whether the node is the leader or one of the followers. For the followers, each node is available to measure the relative state information from its neighbors by using on-board sensors, i.e., for all i V\{1}, y ij (k) = x i (k) x j (k), j N i () for k N, where y ij (k), j N i is the output measurement. The set of outputs collected at the node i can be expressed as y i (k) = C i x(k), (6) where y i (k) R n Ni is the vector collecting the measurement outputs for all j N i, C i is the matrix defined through (), and x(k) = [x 1 (k) T x 2 (k) T... x M (k) T ] T. Regarding the measurement outputs for the leader, we consider the following two cases. First, it can measure not only the relative state information from its neighbors but also its own state information, i.e., y 1j (k) = x 1 (k) x j (k), j N 1, (7) y 11 (k) = x 1 (k), (8) where y 1j (k), j N 1 and y 11 (k) are the ouput measurements. If the leader is able to obtain both (7) and (8), we say that the leader is in active mode. Second, it can measure, similarly to the followers, only the relative state information from its neighbors, i.e., y 1j (k) = x 1 (k) x j (k), j N 1, (9) and then we say that the leader is in non-active mode. In practice, the leader s mode indicated above changes over time depending on physical environments. For example, if the leader moves in the outdoor environment that can utilize GPS to locate the state, the measurement in (8) is available and thus the leader is in active mode. On the other hand, if the leader enters an indoor environment in which it cannot utilize GPS due to signal loss by the presence of walls in the building (e.g., inside the tunnel, underground, etc.), the measurement in (8) is not available and the leader is in non-active mode. To indicate whether the leader is active or non-active, let a 1 (k) {, 1} be given by {, if the leader is in mode. a 1 (k) = 1, if the leader is in active mode Regarding the leader s mode, we further assume the following. Assumption 1. The leader is in active mode at the initial time, i.e., a 1 () = 1. Moreover, there is no fault at k =, i.e., I =. Note that the leader can be either active or non-active mode for all k N\{}. As we will see in the analysis of Section IV, the assumption of being an active mode at k = leads to the correct fault and state estimation for all the time steps afterwards. As with (6), the set of outputs for the leader can be expressed as y 1 (k) = C 1 x(k), if a 1 (k) =, (1) y 1 (k) = C 11 x(k), if a 1 (k) = 1, (11) where C 1 is the matrix obtained by collecting all measurements from (9), and C 11 is the matrix obtained from (7) and (8). Remark 1. In this paper, we assume that there exists only one node that can be the active mode (i.e., only the leader can utilize the GPS). The main reason for this is that we aim to reduce the communication cost for implementation; for example, if more nodes were available to utilize the GPS, the communication cost would be high so that the energy consumption might become a critical problem. Thus, the communication effort is imized here so that only the leader can utilize the GPS. While this assumption may be reasonable, the proposed estimator and the theoretical analysis provided in later sections can be extended to the case when multiple nodes can utilize the GPS. For details, see Remark 2. Based on (3) and (6) the overall dynamics for all nodes can be expressed as x(k) = Ax(k 1) + Bu(k 1) + f(k), (12)

4 for k N, where x(k) = [x 1 (k) T x 2 (k) T... x M (k) T ] T, (13) u(k) = [u 1 (k) T u 2 (k) T... u M (k) T ] T, (14) f(k) = [f 1 (k) T f 2 (k) T... f M (k) T ] T, (1) and A, B are the matrices with appropriate dimensions. The measurement output is given by { C x(k), if a 1 (k) = y(k) = C 1 x(k), if a 1 (k) = 1, (16) where y(k) = [y 1 (k) T y 2 (k) T... y M (k) T ] T and C 1 C 11 C 2 C =., C C 2 1 =.. (17) C M C M Note that the pair (A, C ) is un-observable when all nodes are identical (i.e., A 1 = A 2 = = A M ), see e.g., [9] [11], and such case is included in the above system when the leader is non-active. (Example) : Suppose that V = {1, 2, 3} and {1, 2}, {2, 3} E, and the dynamics of node i is scalar and given by x i (k +1) = x i (k) + f i (k), i {1, 2, 3}. Then, we have A = I 3, B = and C = [ 1 1 1 1 ], C 1 = 1 1 1 1 1. (18) B. State and fault estimation using l 1 -norm optimization For each k N, we aim at estimating the state x(k) and the fault f(k), based on the current output measurement y(k). We start by formulating a centralized solution, which estimates them by employing all output measurements as in (16). A distributed strategy will be formulated later in this paper. Let ˆx(k), ˆf(k), k N be the state and the fault signals that are estimated at k, respectively. For each k N, we propose to solve the following optimization problem to obtain these estimates: ˆx(k) = arg x Aˆx(k 1) Bu(k 1) 1 x s.t. y(k) = Cx, (19) where C = C if a 1 (k) = and C = C 1 if a 1 (k) = 1. Without loss of generality, we set ˆx(k 1) = for k =. The estimation for ˆf(k) is then computed as ˆf(k) = ˆx(k) Aˆx(k 1) Bu(k 1). (2) In general, the term Aˆx(k 1) + Bu(k 1) is called apriori estimate for k. Thus, the optimization problem in (19) aims at finding the state that is the closest to the apriori estimate, subject to the constraint on the output measurement y(k). Letting z = x Aˆx(k 1) Bu(k 1), the problem in (19) leads to x z 1 s.t., ỹ(k) = Cz, (21) where ỹ(k) = y(k) C(Aˆx(k 1) + Bu(k 1)), which is indeed a well-known Basis Pursuit (BP) [17]. Hence, various numerical solvers can be applied to obtain the solution to (21), both in a centralized manner [2], [26], and distributed manner [18] (see also Section VI for the the distributed formulation). The optimization problem in (19) is indeed relevant to the existing estimation methodologies. To see this, consider the following weighted l 2 -norm optimization problem: x x Aˆx(k 1) Bu(k 1) 2 P 1 + y(k) Cx 2 R 1, where R, P are the positive definite matrices. Then, the solution to the above optimization problem is given by ˆx(k) =Aˆx (k 1) + AP C T (R + CP C T ) 1 (y(k) CAˆx (k 1)), (22) where ˆx (k) = Aˆx(k 1) + Bu(k 1), which is indeed the Kalman filter update. Thus, the optimization problem in (19) is, roughly speaking, a modified version of the Kalman filter by utilizing the l 1 -norm in the objective function. The utilization of the l 1 -norm potentially provides a good estimator of faults, as the following example illustrates. (Example) : Suppose that V = {1, 2, 3} and {1, 2}, {2, 3} E, and the dynamics of node i is scalar and given by x i (k + 1) = x i (k) + f i (k), i {1, 2, 3}. As with the example in Section III-A, we have A = I 3, B =, and C, C 1 are given by (18). Assume that a 1 (k) = 1, k [, 19] and a 1 (k) =, k [2, 4]. Note that the pair (A, C ) is un-observable. The initial states are x = [2; 4; 6] R 3 and the evolution of the states are illustrated in Fig. 1 (blue solid lines). As shown in the figure, it is assumed that the fault occurs at k = 3 for x 1 (k) with f 1 (k) = 3. The estimation results by applying (19) and the Kalman filter are also illustrated in Fig. 1 (left figure). While both approaches provide somewhat good estimates for a while, it is shown that the l 1 -norm optimization approach dramatically overcomes the Kalman filter when the fault occurs at k = 3. Intuitively, this is due to the fact that the proposed approach utilizes the l 1 -norm objective function; if ˆx(k 1) x(k 1), which may hold in this example since the system is observable for a while, we then have x(k) Aˆx(k 1) Bu(k 1) f(k), and solving (19) may lead to be a good estimator as f(k) is sparse with only the first component taking non-zero values. Although the l 1 -norm optimization approach becomes a good estimator in the above example, it leads to a failure when more nodes are faulty. To illustrate this, the right figure of Fig. 1 shows the result when both x 1 and x 2 are faulty at the same time k = 3. Clearly, the l 1 -optimization approach fails to reconstruct states and faults, which shows that the estimation performance depends on the number of faulty nodes. Additionally, since ˆx(k 1) is utilized for the estimator in (19), the estimation accuracy may also depend on the estimation error at the previous time; if these errors are too large, the estimation accuracy may be worse. Motivated by the above observations, in this paper we provide answers to the following questions.

State 8 6 4 2-2 -4 x 3 x 2 x 1 C = C 1 True Proposed Kalman C = C 1 2 3 4 State 8 6 4 2-2 -4 x 3 x 2 x 1 C = C 1 True Proposed Kalman C = C 1 2 3 4 Fig. 1. Example of applying (19) and the Kalman filter when one node is faulty (left) and two nodes are fauty (right). The black vertical line indicates the time step when the system becomes un-observable (k = 2). How many faulty nodes can be tolerated to provide correct estimation? How does the estimation error at k 1 affects the estimation error at k? IV. ANALYSIS OF THE ESTIMATOR In this section we provide a quantitative analysis of the proposed estimator as the answer to the first question considered in the previous section. Recall the optimization problem (19): x x Aˆx(k 1) Bu(k 1) 1 s.t. y(k) = Cx, (23) for k N, where C = C if a 1 (k) = and C = C 1 if a 1 (k) = 1. Suppose that Assumption 1 holds (a 1 () = 1, I = ), and consider the unknown sets of faulty nodes I k {1,..., M}, for k N\{}, which means that f i (k), i I k and f i (k) = otherwise. Let F Ik R nm be given by F Ik = {f(k) R nm : f i (k), i I k and f i (k) =, i / I k }. (24) That is, F Ik is the domain of all f(k) when the set of faulty nodes is given by I k. Then, we can say that the estimator is correct if the estimator in (23) yields the estimation that is identical to the actual one, i.e., for all k N\{}, it follows that ˆx(k) = x(k), ˆf(k) = f(k), f(k) FIk, u(k) R mm. The following result presents a necessary and sufficient condition for this as the main result of this section: Theorem 2. Consider the estimator in (23), and suppose that Assumption 1 holds. Then, the following statements are equivalent: (a) I k < M/2, k N\{}. (b) ˆx(k) = x(k), ˆf(k) = f(k), f(k) FIk, u(k 1) R mm, a 1 (k) {1, }, k N\{}. In essence, Theorem 2 indicates that state and fault signals are correctly estimated for all k N\{} regardless of the values of f(k), control inputs u(k 1) and the leader s mode a 1 (k), if and only if the number of faulty nodes is less than half the total number of nodes for all k N\{}. In other words, it does not provide a correct estimation if the number of faulty nodes is more than or equal to M/2. Indeed, this fundamental limitation is similar to the ones presented in [19] [21], where they showed that the number of attacks on sensors that can be tolerated to provide a correct estimation cannot exceed half the total number of sensors. While Theorem 2 exhibits some similarlities to the previous works in [19] [21], our result is different in the following sense. In the previous results, they consider reconstructability of states under sensor attacks for only observable systems. In contrast, as described in Section III-A, we include here the case when the overall system becomes un-observable (i.e., the leader is non-active) due to the constraint that only relative information is available. As we will see below, this can be shown by deriving an explicit condition to satisfy the nullspace property of the measurement matrix C. In order to provide the proof of Theorem 2, we resort the following lemma: Lemma 1. For a given k N\{} suppose that ˆx(k 1) = x(k 1) holds. Then, the following statements are equivalent: (a) I k < M/2. (b) ˆx(k) = x(k), ˆf(k) = f(k), f(k) FIk, u(k 1) R mm, a 1 (k) {1, }. Lemma 1 indicates that if there is no estimation error at k 1, a correct estimation is given at k if and only if I k < M/2. Let us provide the proof of Lemma 1. To this end, it is required to take several steps to modify the optimization problem in (23). Since x(k 1) ˆx(k 1) =, we have x(k) = Aˆx(k 1) + Bu(k 1) + f(k) and so y(k) = C(Aˆx(k 1) + Bu(k 1) + f(k)). Thus, the optimization problem becomes x Aˆx(k 1) Bu(k 1) 1 x (2) s.t., C(Aˆx(k 1) + f(k) x) =. (26) Letting z = x Aˆx(k 1) Bu(k 1) be the new decision variable to obtain z z 1 s.t., Cf(k) = Cz, (27) where C = C if a 1 (k) = and C = C 1 if a 1 (k) = 1. Let ẑ(k) be the solution to (27). Since z = x Aˆx(k 1) Bu(k 1) and ˆf(k) is given by (2), we have ẑ(k) = ˆf(k). Let z = [z1 T, z2 T,..., zm T ]T, where z i R n, i {1,..., M} represents the variable for node i. In the following, we rearrange the vectors z, f(k) and the matrix C as follows. Let T i {1,..., nm}, i {1, 2,... n} be given by T i = {i, i + n, i + 2n,..., i + (M 1)n}, and rearrange z as z p = [zp T 1, zp T 2,..., zp T n ] T, where z pi = z Ti, i {1, 2,..., n}. For example, if M = 3, n = 2 and we have z = [z 11 ; z }{{ 12 ; z } 21 ; z 22 ; z }{{} 31 ; z 32 ] R 6, (28) }{{} z 1 z 2 z 3 z p = [z 11 ; z 21 ; z }{{ 31 ; z } 12 ; z 22 ; z 32 ]. (29) }{{} z p1 z p2 Roughly speaking, z pi represents a vector collecting the i-th component of all the nodes. Similarly, rearrange f(k) as f p = [fp T 1, fp T 2,..., fp T n ] T, where f pi = f Ti, i {1, 2,..., n}. Note

6 that due to the rearrangement of f(k), f i (k) implies that (f p ) Ti, where T i = {i, i+m, i+2m,..., i+(n 1)M}. For example, if we have f = [f 11 ; f }{{ 12 ; f } 21; f 22 ; f }{{} 31; f 32 ] R 6, (3) }{{} f 1 f 2 f 3 f p = [f 11 ; f 21 ; f }{{ 31 ; f } 12; f 22; f 32 ], (31) }{{} f p1 f p2 and f 1 implies (f p ) T1, where T 1 = {i, i + M} = {1, 4}. Thus, letting F Ik be given by F Ik = {f p (k) R nm : (f p (k)) Ti, i I k and it follows that (f p (k)) Ti =, i / I k }, (32) f(k) F Ik f p (k) F Ik. (33) Note that f p (k) F Ik implies that f p (k) is T -sparse, where T = i Ik Ti. Moreover, by rearranging the columns of C according to the rearrangement for z as well as suitably rearranging the rows, one can always construct the matrix C p for the active mode (a 1 (k) = 1) and C p1 for the non-active mode (a 1 (k) = ) as D 1 D 1 C p =......, (34) D 1 D 2 D 2 C p1 =......, (3) D 2 where D 1 = D T and D 2 = [e, D] T with D being the incidence matrix of the graph G and e = [1,,,, ] T R M. For example, suppose that C, C 1 are given by C = [ 1 1 1 1 ], C 1 = 1 1 1 1 1 1, which implies that {1, 2} E and the incidence matrix is given by D = [1, 1] T. By rearranging the columns of the above matrices according to the one for z, we obtain C = [ 1 1 1 1 ], C 1 = 1 1 1 1 1 1, By exchanging the second and the third rows of C 1, we obtain [ ] 1 1 1 1 1 C p = 1 1, C p1 = 1, 1 1 which can be checked to be of the form (34) and (3), respectively. Based on the above rearrangement, the optimization problem becomes z p z p 1 s.t., C p f p (k) = C p z p, (36) where C p = C p if a 1 (k) = and C p = C p1 if a 1 (k) = 1. Let ˆfp (k) be the optimal solution of z p to (36). Note that the problem is invariant under the above rearrangements. That is, letting ẑ(k) = ˆf(k) and ˆf p (k) = [ ˆf p1 (k) T, ˆf p2 (k) T,..., ˆf pn (k) T ] T be the solutions to (27) and (36), respectively, it follows that ˆfpi (k) = ˆf Ti (k), i {1,..., M}, i.e., the optimal solutions to (27) and (36) are equivalent under the above rearrangements. Thus, we obtain f(k) = ˆf(k), f(k) F Ik f p (k) = ˆf p (k), f p (k) F Ik. (37) Based on the above notations, the proof of Lemma 1 is provided as follows. (Proof of Lemma 1): (a) (b) : Suppose that I k < M/2 holds. For the active mode case a 1 (k) = 1, we have C p = C p1 and since the graph is weakly connected, it is shown from (3) that rank(c p ) = nm and ker(c p ) =. This means that z p satisfying the constraint C p (f p (k) z p ) = is a single point and is given by z p = f p (k), which shows that the optimization problem in (36) yields ˆf p (k) = f p (k). This implies that we have ˆf(k) = f(k), f(k) F Ik, u(k 1) R mm. Moreover, since ˆf(k) = f(k), it follows that ˆx(k) = Aˆx(k 1) + Bu(k 1) + ˆf(k) = Ax(k 1) + Bu(k 1) + f(k) = x(k), (38) which means that the state x(k) is also correctly estimated. Thus, for the active mode case, the state and fault signals are correctly estimated by solving (19). Let us now consider the non-active mode case C p = C p. Let Ti = {i, i + M, i + 2M,..., i + (n 1)M}, i {1,..., M} and T = i Ik Ti. Since C p = C p is given by (34) and the graph is weakly connected, we obtain ker(c p ) = [γ 1 1 T M, γ 2 1 T M,..., γ n 1 T M ] T, (39) where γ i R, i {1,..., n} and we have used the fact that ker(d T ) = γ1 T M with γ R (see Section II-A). Thus, it follows that n v Ti 1 = γ j, (4) j=1 for any v ker(c p ). Thus, we obtain v T 1 = n v Ti 1 = I k γ j (41) i I k j=1 n v T c 1 = (M I k ) γ j, (42) j=1 where T c = {1, 2,..., nm}\ T. Since I k < M/2, it follows that v T 1 < v T c 1, v ker(c p )\{}, i.e., C p satisfies

7 T -NSP. Since f p (k) F Ik implies that f p (k) is T -sparse, it follows from Theorem 1 that f p (k) = ˆf p (k), f p (k) F Ik. This implies that ˆf(k) = f(k), f(k) FIk, u(k 1) R mm. From (38), it then follows that ˆx(k) = x(k). Hence, (a) (b) holds. (b) (a) : Suppose that (b) holds. To prove by contradiction, suppose that I k M/2 and consider the non-active mode case a 1 (k) =. Since I k M/2, it follows from (41) and (42) that v T 1 v T c 1, v ker(c p )\{}, i.e., C p does not satisfy T -NSP. Thus, it follows from Theorem 1 that there exists f p (k) F Ik such that ˆfp (k) f p (k), which means that ˆf(k) f(k) and the optimization problem in (19) does not provide a correct estimation. Indeed, this contradicts to the statement in (b) that ˆf(k) = f(k), f(k) F Ik. Hence, (b) (a) holds. It is now not difficult to show Theorem 2: (Proof of Theorem 2): By Assumption 1, the leader is the active mode at k = (C = C 1 ) and thus rank(c p ) = rank(c) = nm at k =. This implies that the optimization problem in (19) yields ˆx() = x(), since C = C 1 has a trivial kernel and the solution of x satisfying y() = Cx is uniquely detered. To provide the proof, suppose that (a) holds, i.e., I k < M/2 for all k N\{}. Since ˆx() = x(), it follows from Lemma 1 that ˆx(1) = x(1), ˆf(1) = f(1), f(1) R nm I 1, u() R mm. Since ˆx(1) = x(1), it then follows that ˆx(2) = x(2), ˆf(2) = f(2), f(2) R nm I2, u(1) R mm. Similarly, it follows recursively by applying Lemma 1 that ˆx(k) = x(k), ˆf(k) = f(k), f(k) FIk, u(k 1) R mm. Hence, (a) (b) holds. Conversely, suppose that (b) holds. Since ˆx() = x(), it follows from Lemma 1 that I 1 < M/2. Similarly, it follows recursively by applying Lemma 1 that I k < M/2, k N\{}. Hence, (b) (a) holds. Remark 2 (On the extension to the case when multiple nodes can be in active mode). The result in Theorem 2 can be extended to the case when multiple nodes can be in active mode (i.e., multiple nodes can utilize the GPS). To see this, suppose that the nodes 1 and 2 can be in active mode, and let a 1 (k) {, 1} and a 2 (k) {, 1} be the indicators of the mode for the node 1 and 2, respectively. In addition, assume similarly to Assumption 1 that either (or both) of the nodes is active at k =, i.e., a 1 () = 1 or a 2 () = 1. Then, we can consider the following three cases: (i) both nodes are non-active a 1 (k) = a 2 (k) = ; (ii) both nodes are active a 1 (k) = a 2 (k) = 1; (iii) either of the node is active a 1 (k) = 1, a 2 (k) = or a 1 (k) =, a 2 (k) = 1. For case (i), the observation matrix after rearranging the columns and rows is given by (34), and so the same argument is applied as the proof of Lemma 1 for the non-active mode case. For case (ii), C p1 is given by (3), where D 1 = [e 1, e 2, D] T with e 1 = [1,,,, ] T R M and e 2 = [, 1,,, ] T R M. Since rank(c p1 ) = nm, we can apply the same argument as the proof of Lemma 1 for the active mode case. For case (iii), C p1 is given by (3), where D 1 = [e 1, D] T if a 1 (k) = 1 or D 1 = [e 2, D] T if a 2 (k) = 1. Since rank(c p1 ) = nm for both cases, we can also apply the same argument as the proof of Lemma 1 for the active mode case. Therefore, correct stateand-fault estimation is guaranteed when two nodes can be in active mode. The same argument can be applied to the case when more nodes can be in active mode. V. ANALYSIS OF THE ESTIMATOR IN THE PRESENCE OF NUMERICAL ERRORS In the previous section, we provide a necessary and sufficient condition such that the proposed approach provides correct estimation. Correct estimations are given for all times when no numerical errors arise in solving the optimization problem; in practice, however, these may be present due to lack of iterations when solving (19). For example, suppose that for some k 1, the estimation error is present and we have x(k 1) ˆx(k 1). Since ˆx(k 1) is utilized for the estimation at k in (19), the state-and-fault estimation at k is no longer identical to the true one, i.e., ˆf(k) f(k), ˆx(k) x(k). In this section, we analyze how the estimation error at k 1 affects the estimation error at k as the answer to the second question considered in Section III-B. The main result of this section is given as follows: Theorem 3. For a given k N\{} and d max R, suppose that ˆx(k 1) x(k 1) d max holds, and let I k be any set of faulty nodes with I k < M/2. Then, for every f(k) F Ik, u(k) R mm and a 1 (k) {, 1}, it holds that f(k) ˆf(k) 1 2(M I k ) ηd max, M 2 I k (43) where η = nm nm i=1 j=1 A(i,j). Note that Theorem 3 for the case d max = corresponds to the claim (a) (b) in Lemma 1. Similarly to the proof of Lemma 1 and Theorem 2, it is required to take several steps to modify the optmization problem in (23). Since x(k 1) ˆx(k 1) 1 d max, x(k 1) is expressed as x(k 1) = ˆx(k 1) + d, where d 1 d max. Thus we obtain y(k) = C(Aˆx(k 1) + A d + Bu(k 1) + f(k)) and (23) becomes z z 1 s.t., Cz = C(f(k) d), (44) where we let z = x Aˆx(k 1) Bu(k 1) and d = A d. Thus, (44) differs from (27) in the sense that the term d arised from the estimation error at k 1 is added in the constraint. Since nm j=1 A(i,j), it follows that d 1 = A d 1 η = nm i=1 ηd max. By rearranging the vectors z, f(k), d and the matrix C as with the procedure presented in Section IV, the problem becomes z z p 1 s.t., C p z p = C p (f p (k) d p ), (4) where d p = [d T p 1, d T p 2,..., d T p n ] T with d pi = d Ti, i {1, 2,..., n}. In (4), C p = C p if a 1 (k) = and C p = C p1 if a 1 (k) = 1. It follows that d p 1 ηd max and f(k) ˆf(k) 1 ɛ, f(k) F Ik f p (k) ˆf p (k) 1 ɛ, f p (k) F Ik, (46)

8 for a given ɛ >. The proof of Theorem 3 is now stated as follows. (Proof of Theorem 3): Let ˆfp (k) be the optimal solution of z p to (4). For the active mode case, we have C p = C p1 and it follows that rank(c p ) = nm and ker(c p ) =. Thus, we have ˆf p (k) = f p (k) d p and f p (k) ˆf p (k) 1 ηd max, which means that f(k) ˆf(k) 1 ηd max. Since ηd max < 2(M I k ) M 2 I k ηd max, we obtain (43). Consider now the mode case, i.e., C p = C p. Since ˆf p (k) is the optimal solution, it follows that ˆf p (k) f p (k). Since C p ˆfp (k) = C p (f p (k) d p ), there exists v ker(c p ) such that ˆfp (k) = f p (k) d p + v. Thus, it follows that f p 1 f p d p + v 1 = f p (i) d (i) p + v (i) + p i T ( ) f p (i) d (i) p v (i) + i T i T c f (i) i T c d (i) p + v (i) ( ) v (i) f p (i) d p (i) where the time index k for f p is omitted for brevity, and T = i Ik Ti with T i = {i, i + M, i + 2M,..., i + (n 1)M}, i {1,..., M}. For the third inequality in the above, we used the triangle inequality ( a + b a b ). Thus, it follows that f p 1 (f p ) T (d p ) T 1 (d p ) T 1 v T 1 + v T c 1, (47) where we used f p (i) =, i T c as f p F Ik. Moreover, it follows from (42) that v T 1 = I k M I k v T c 1. By plugging this into (47), we obtain v T c 1 M I k M 2 I k ( f p 1 (f p ) T (d p ) T 1 + (d p ) T c 1 ). (48) Since we have f p 1 = (f p ) T + (f p ) T c = (f p ) T and by using the triangle inequality, we obtain v T c 1 M I k M 2 I k ( (d p) T 1 + (d p ) T c 1 ) Therefore, it follows that M I k M 2 I k ηd max. ˆf p (k) f p (k) 1 = d p + v 1 ηd max + v 1 ( = ηd max + 1 + I ) k v M I k T c 1 2(M I k ) M 2 I k which implies that (43) holds. ηd max, (49) Based on Theorem 3, we can obtain how the estimation error of states propagates depending on whether the leader is active or non-active. Suppose that the estimation error arises at k 1 and is bounded as x(k 1) ˆx(k 1) d max and I k < M/2. If the leader is non-active, it follows that x(k) ˆx(k) 1 A(x(k 1) ˆx(k 1)) 1 + f(k) ˆf(k) 1 3M 4 I k M 2 I k ηd max, () where we have used A(x(k 1) ˆx(k 1)) ηd max and the upper bound of f(k) ˆf(k) 1 is given by (43). On the other hand, if the leader is active the optimization problem in (19) yields ˆx(k) = x(k), since C = C 1 has a trivial kernel and the solution of x satisfying y(k) = Cx is uniquely detered. As a consequence, we obtain x(k) ˆx(k) 1 =, (if a 1 (k) = 1), 3M 4 I k ηd max, (if a 1 (k) = ), M 2 I k (1) The above relation implies that the estimation error is reset to if the leader sets into the active mode, while on the other hand the error may diverge as long as the leader is in mode. Moreover, it follows that 3M 4 I k M 2 I k < 3M 4 I k M 2 I k for any I k, I k with I k < I k < M/2. This implies that the estimation error may be larger as the number of faulty nodes becomes larger. The above observations will be also illustrated through numerical simulations presented in Section VII. VI. DISTRIBUTED IMPLEMENTATION In this section we provide an algorithm that yields the solution to (19) in a distributed manner, which means that each node is able to estimate states and faults for all nodes by coordinating only with its neighbors. The approach follows the distributed basis pursuit (DBP) algorithm [18], which is a solver for l 1 -norm optimization problem based on the alternating direction method of multipliers (ADMM). In this section, we provide an overview of the methodology presented in [18] and apply to our fault and state estimation problem. Assume that the network graph G = (V, E) is connected, bidirectional and bipartite: {i, j} E implies {j, i} E and the vertices can be decomposed into two independent sets M 1, M 2, i.e., every edge in E connects a vertex in M 1 (or M 2 ) to a vertex in M 2 (or M 1 ). In addition, assume that the control law for node i is based on the relative state information from its neighbors, and all nodes share the same control law, i.e., u i (k) = ρ(r Ni (k)), i {1,..., M}, (2) where r Ni (k) = {x i (k) x j (k) : j N i } and ρ denotes the control law. The control law that collects all nodes is denoted as u(k) = κ(x(k)), where κ = [ρ T, ρ T,..., ρ T ]. The following assumption is further required: Assumption 2. The matrix A is known to all nodes. Recall that y(k) and C, C 1 can be partitioned by rows as C 1q y 1(k) C 2 C q =., y(k) = y 2(k)., (3). C M y M (k)

9 for q {, 1}, where y i, i {1,..., M} is the output measurements that are available to node i. To employ the approach presented in [18] for the row partition case, let χ i R nm, i {1,..., M} be the decision variable that will be optimized by node i. Roughly speaking, χ i (k) represents the estimated state of x(k) obtained by node i. The optimization problem in (19) is then re-formulated as follows: 1 M χ i Aˆχ i (k 1) Bκ(ˆχ i (k 1)) 1 M i=1 subject to : y 1 (k) = C 1q χ 1, y i (k) = C i χ i, i {2,..., M} χ i = χ j, {i, j} E, (4) where q = (resp. q = 1) if a 1 (k) = (resp. a 1 (k) = 1), and ˆχ i (k 1) R nm is the estimated state obtained by node i at k 1. Let ˆχ i (k) R nm, i {1,..., M} be the solution to the optimization problem in (4). From the connectivity of the graph and the constraint in (4), global consistency for the optimal solution is satisfied, i.e., ˆχ 1 (k) = ˆχ 2 (k) = = ˆχ M (k). Using the independent sets M 1, M 2, the optimization problem in (4) can be rewritten as 1 χ i Aˆχ i (k 1) Bκ(ˆχ i (k 1)) 1 M i M 1 + 1 χ i Aˆχ i (k 1) Bκ(ˆχ i (k 1)) 1 M i M 2 subject to : y 1 (k) = C 1q χ 1, y i (k) = C i χ i, i {2,..., M}, (E T 1 I nm ) χ 1 + (E T 2 I nm ) χ 2 =, () where χ 1 R nm M1 and χ 2 R nm M2 are the vectors collecting χ i for all i M 1 and i M 2, respectively, I nm denotes the nm nm identity matrix, denotes the Kronecker product, and E 1, E 2 are the matrices such that E = [E1 T, E2 T ] T is the incidence matrix of the graph G. The structure of the optimization problem in () is now suited to apply the ADMM algorithm in a distributed manner; by defining the augmented Lagrangian L( χ 1, χ 2, λ), it can be verified that imizing L with respect to χ 1 and χ 2 can be executed parallelly among the nodes i M 1 and i M 2, respectively, and the dual variables can be also updated parallelly for all nodes, see [18] for the detailed derivation. The overall distributed algorithm is shown in Algorithm 1. In the algorithm, the variable χ i is updated according to (6), where ν i > is the i-th diagonal element of E 1 E1 T, and ζ > is a constant parameter for the quadratic penalty added in the objective function to define the augmented Lagrangian. By Assumption 2, the updates in (6) can be executed in parallel for all nodes in M 1. Once all updates in i M 1 are done, these are transmitted to its neighbors. Similarly, the updates for all nodes in M 2 are executed in parallel and are transmitted to their neighbors. Afterwards, the variables µ i relating to the dual variables are updated in parallel for all nodes in i V. The above procedure is iterated until the number of iterations reach a prescribed threshold L max, and obtain the Algorithm 1: Distributed implementation. 1 Set k = and ˆχ i(k 1) =, i {1,..., M}; (initialization) 2 for each k N do 3 Set µ 1 i = χ 1 1 =, i {1,..., M} and L = 1; 4 while L < L max do for all i M 1 do 6 Set v L i = µ L i ζ j N i χ L j and solve the following problem: 1 χ i M χi 1 + (vl i ) T χ i + νiζ 2 χi 2 s.t. y i = C iχ i, (y i = C iqχ i if i = 1), (6) where χ i = χ i Aˆχ i(k 1) Bκ(ˆχ i(k 1)) and let χ L+1 i be the optimal solution to (6). 7 Send χ L+1 i to all i N i; 8 end 9 for all i M 2 do 1 Repeat the same procedure in line 6 line 7; 11 end 12 for all i V do 13 Set µ L+1 i = µ L i + ζ (χ L+1 i χ L+1 j ) j N i (7) 14 end 1 Set L := L + 1; 16 end 17 For all i {1,..., M}, set ˆχ i(k) = χ L i and ˆς i(k) = ˆχ i(k) Aˆχ i(k 1) Bκ(ˆχ i(k 1)); 18 end estimations of states and faults that are denoted as ˆχ i (k) and ˆς i (k), respectively. Based on the above algorithm, the following convergence property is satisfied: Lemma 2. For given k N\{} and χ R nm, let ˆx(k) be the solution to (19) with ˆx(k 1) = χ and let ˆχ i (k), i {1,..., M} be the estimated states by applying Algorithm 1 at k with ˆχ i (k 1) = χ, i {1,..., M}. Then, it follows that ˆx(k) = ˆχ i (k), i {1,..., M} as L max. Lemma 2 represents that the state estimate at k by applying Algorithm 1 converges to a solution to (19). The result follows exactly the same line as [18] and is omitted in this paper. Recalling that ˆx(k 1) = for k = (see Section III-B), we have ˆx(k 1) = ˆχ i (k 1), i {1,..., M} for k =. Thus, the following is immediate by applying Lemma 2 for all k N: Theorem 4. Suppose that Algorithm 1 is implemented for all k. Then, it follows that ˆx(k) = ˆχ i (k), i {1,..., M}, k N as L max. VII. ILLUSTRATIVE EXAMPLE In this section we provide an illustrative example to validate our control scheme. We consider a control problem of a multivehicle system with 9 vehicles (nodes), which moves in twodimensional plane. Let p i = [p xi ; p yi ] R 2, v i = [v xi ; v yi ] R 2, i {1,..., 9} be the position and the velocity of vehicle

1 8 9 Outdoor 4 2 1 Indoor fx active f x2 f x4 f x9 fy active f y6 6 3 f y9 f y2 7 f x6 Fig. 2. Simulation senario. Each node represents a vehicle and each edge represents a communication and sensing capacity between the corresponding vehicles. As shown in the figure, the graph is assumed to be bidirectional and tree (bipartite). The gray zone represents an indoor enviroment that forces the leader to be in non-active mode. i, respectively. The dynamics of vehicle i is assumed to be a double-integrator: 1 ẋ i = 1 x i + 1 u i + f i (8) 1 where x i = [p i ; v i ] R 4, u i = [u xi ; u yi ] R 2 is the control input, and f i R 4 is the fault signal. We discretize the system in (8) under a zero-order hold with.s sampling time interval to obtain x i (k) = A i x i (k 1)+B i u i (k 1)+f i (k), where A i and B i are the appropriate matrices. We assume that only velocity states are subject to faults, i.e., f i (k) = [ f xi f yi ] T. Since we have v xi = u xi + f xi and v yi = u yi + f yi, one can also see that this situation is the case when actuators are subject to faults by physical faults or by malicious attackers. The network graph G = (V, E) indicating communication and sensing capacities among the vehicles, as well as the senario in this example is shown in Fig. 2. As shown in the figure, there exist two environments; outdoor and indoor environment. In the outdoor environment, the leader is able to provide communication with GPS satellites to obtain the state information x 1, which allows the leader to be in active mode. In the indoor environment, the leader is not able to utilize the GPS due to signal loss by presence of walls in the building, which is thus forced to be in non-active mode. The leader initially starts in the outdoor environment to ensure Assumption 1 and moves along with the x-axis with a constant speed v 1 = 1, with which it enters the indoor environment. More specifically, we assume that active/non-active mode is given by {, if k [1, 3], (9) a 1 (k) = 1, otherwise. (6) Note that for the overall system derived in (12), the pair (A, C 1 ) is observable, while (A, C ) is not observable. This means that the observability is lost while the leader is during k [1, 3]. Despite such un-observability, we will show below that the proposed estimator provides a correct estimation for both states and fault signals. The control law for the leader is given by u x1 = v x1 + 1, =, so that it moves along the x-axis with a constant ve- u y1 px py - 1 1 2 2 7 6 4 3 2 1 2 1-1 -2-3 1 2 3 1 2 3 f y4-1 1 2 2 (a) Faulty signals f xi and f yi. 1 2 3 4 px 6 4 2 (b) Trajectories of p xi. 1 2 3 4 py 3 2 1-1 -2-3 (c) Trajectories of p yi. 1 2 3 1 2 3 Fig. 3. True trajectories of fault signals f xi, f yi injected to the vehicles and the states of p xi, p yi. locity v x1 = 1. All the other vehicles aim to follow the leader by interacting with their neighbors. Specifically, the control law for the followers are given by u xi = j N i c 1 (p xi p xj + d xij ), u yi = j N i c 2 (p yi p yj + d yij ), where c 1, c 2 > are given constant gains and [d xij ; d yij ] is the desired distance vector from i to j. Fig. 3 illustrates fault signals that are injected to the vehicles and the actual trajectories of p i. As shown in the figure, it is assumed that the vehicle 2, 4, 6 and 9 are subject to faults with different shapes while the leader is non-active mode. As a consequence, the resulting trajectories provide somewhat faulty behaviors as shown in Fig. 3. Fig. 4 illustrates the estimated states and faults by the leader 6 7 8 9 6 7 8 9

11 active f x4 active active active f x2 f y6 f x9 fx fy fx fy f y9 f x6 f y2 px py - 1 1 2 2 7 6 4 3 2 1 2 1-1 -2-3 (a) Estimated fault signals f xi and f yi. 1 2 3 1 2 3 px f y4-1 1 2 2 6 4 2 (b) Estimated trajectories of p xi. py 3 2 1-1 -2-3 (c) Estimated trajectories of p yi. 1 2 3 1 2 3 Fig. 4. Estimated fault signals and states of χ 1 (k) by applying Algorithm 1 (red dotted lines). px py - 1 1 2 2 7 6 4 3 2 1 2 1-1 -2-3 - 1 1 2 2 (a) Estimated fault signals f xi and f yi. 1 2 3 1 2 3 px 6 4 2 (b) Estimated trajectories of p xi. py 3 2 1-1 -2-3 (c) Estimated trajectories of p yi. 1 2 3 1 2 3 Fig.. Estimated fault signals and states by applying the Kalman filter (red dotted lines). (i.e., χ 1 (k)) by applying Algorithm 1. As shown in the figure, state and fault signals are appropriately estimated by applying the proposed approach. This is due to the fact that the number of faulty vehicles is given by 4 < 9/2, which satisfies the condition to guarantee correct estimation from Theorem 2. For comparisons, we illustrate in Fig. the estimated trajectories by applying the Kalman filter. While implementing the Kalman filter, we utilize the update as shown in (22) and the fault signals are estimated according to (2). From the figure, the estimation error diverges while the leader is non-active. The main reason for this is that the observability is lost when the leader is non-active, and the Kalman filter becomes vulnerable to the fault signals. Thus, the proposed approach is shown to be more resilient against faulty signals than the Kalman filtering, as the estimate is correctly provided by our proposed algorithm even when the overall system is un-observable. While the estimated states in Fig. 4 seem to exactly match the actual ones, the estimation error actually increases due to the numerical errors of solving the optimization problem. To see this, we illustrate in Fig. 6 the sequence of estimation errors of states x(k) ˆχ 1 (k) 2 for k [, 3]. As shown in the figure, the error becomes larger as time evolves while the leader is in non-active mode, and it becomes much smaller as soon as it becomes active again at k = 31. Indeed, this corresponds to the observation described in Section V; the upper bound of the error may grow according to (1) if the leader is non-active, while it is reset to (arbitrary small value in this example) when it becomes active. Finally, to see how the number of faulty nodes affect the