MULTISCALE DISTRIBUTED ESTIMATION WITH APPLICATIONS TO GPS AUGMENTATION AND NETWORK SPECTRA

Size: px

Start display at page:

Download "MULTISCALE DISTRIBUTED ESTIMATION WITH APPLICATIONS TO GPS AUGMENTATION AND NETWORK SPECTRA"

Theodore George
6 years ago
Views:

1 MULTISCALE DISTRIBUTED ESTIMATION WITH APPLICATIONS TO GPS AUGMENTATION AND NETWORK SPECTRA A DISSERTATION SUBMITTED TO THE DEPARTMENT OF AERONAUTICS AND ASTRONAUTICS AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Christina Selle June 2010

2 2010 by Christina Selle. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. This dissertation is online at: ii

3 I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Matthew West, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Sanjay Lall, Co-Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Per Enge Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii

4 Abstract Distributed estimation uses a network of sensors to measure a set of variables. The computation tasks required for finding the optimal estimate can be divided among the sensor nodes in a way that can be implemented as an iterative process using nodes with little computational power. Most algorithms for distributed estimation work for small networks, but convergence rates decrease with network size, making them impractical for use in large networks. We present a consensus algorithm with a convergence rate that scales logarithmically with network size by arranging nodes in a multigrid network structure. The algorithm can adapt to changes in the network structure and allows for selection of several parameters, representing a trade-off between performance and robustness of the network. We also describe how the algorithm is adapted to account for time-varying measurements and measurement weights. We present two applications of these methods. Our first application is an algorithm that allows us to determine the spectral properties of a state transition matrix on the network. Since the convergence rate of a consensus algorithm is related to the spectral properties of the state transition matrix, we can use this information to evaluate the effects of changes to the network structure. Our second application is a distributed GPS augmentation system. Traditional GPS augmentation systems use reference receivers to find a set of error correction values, which is broadcast to surrounding mobile receivers. Our distributed augmentation system uses only mobile receivers with unknown locations, which are able to obtain a set of correction values by sharing and processing data in a distributed network. The resulting method can be used to improve GPS point positioning accuracy in areas where fixed augmentation systems are not available. iv

5 Acknowledgments This work was supported by a William R. and Sara Hart Kimball Stanford Graduate Fellowship, and I am deeply thankful to the Kimball family for this support. I would like to thank my adviser, Matt West, for all of the great ideas, guidance, advice, suggestions, encouragement, LaTeX tips, and math lessons he shared with me during my time at Stanford. I also want to express my gratitude to Sanjay Lall, who stepped in as my official Stanford adviser half-way through my journey. Per Enge and Sheri Sheppard have been great professors for me to work with as a teaching assistant, and have also provided valuable advice. Sigrid Close and Ellen Kuhl provided some helpful feedback during and after my Ph.D. oral examination. I could not have made it through graduate school without the support of my friends and family, including my parents Hartmut and Marie-Luise Mester, my sister Mareike Mester, my grandparents, my friends Adam Grossman, Fraser Cameron, Marianne Karplus, Tracy Rubin, the Carlstrom family, and the group of Aeronautics and Astronautics graduate students who shared this journey with me. Last but not least, I would like to thank my husband Andrew Selle for all of his love and support. v

6 Contents Abstract Acknowledgments iv v 1 Introduction 1 2 Multiscale consensus algorithms Introduction Construction of a multilevel network Invariant distribution offset factor determination Adjusting self-weights for improved performance Adjusting the network for broken edges and nodes Performance and Robustness trade-offs Two dimensional numerical example Measurement updates Sensor weights Network spectral properties Conclusion Distributed GPS augmentation Introduction Position solution for a single receiver Gauss-Newton method Gradient descent method vi

7 3.2.3 Newton s method Comparison of different methods Multiple receivers with delay estimation Gauss-Newton method Accuracy and sensitivity to random errors Regularized delay estimation Distributed delay estimation Regularized distributed delay estimation Comparison of the different methods Performance Comparison Multigrid methods for distributed delay estimation Conclusion Distributed spectral methods Introduction and Assumptions Spectral methods for symmetric matrices Adapting spectral methods for distributed networks Spectral methods for nonsymmetric matrices Distributed concurrent computation of eigenvalues Numerical Example Using spectral information for supernode placement Conclusion A Distributed spectral algorithms 85 A.1 Power method A.2 QR-factorization vii

8 List of Tables 3.1 Typical GPS error budget (RMS values) viii

9 List of Figures 2.1 Simple two-level network with five base-level nodes (gray) and two supernodes (black). The base-level nodes form a ring Transition matrix for Theorem Linear system for Theorem State transition matrix with adjusted supernode self-weights Comparison of convergence for a ring network with three levels using Metropolis weights and the multigrid weights described here with and without supernode self-weight adjustments Spectral gap vs. number of nodes in the base level Centralization Robustness vs. Performance trade-off Performance vs. Robustness for various α and β values. The red cross indicates the parameter values chosen for subsequent numerical examples Example network layout Convergence results for the example network Eigenvalues of network with 400 base level nodes with various numbers of levels Selected eigenvectors of a single level ring with 400 nodes Convergence times of different ring-shaped networks given the eigenvectors of a single level ring as starting value Eigenvalues of network with various numbers of levels v 2 of the base level of the network shown in figure v 6 of the base level of the network shown in figure v 30 of the base level of the network shown in figure ix

10 2.18 Convergence times of different networks given the eigenvectors of a single level network as starting value Convergence for 50 receivers without delay estimation Effect of including delay estimation on position estimates Convergence for 500 receivers without delay estimation Convergence for 50 receivers with delay estimation Convergence for 500 receivers with delay estimation Mean positioning error as a function of the number of satellites Mean objective value function per receiver as a function of the number of receivers, with and without delay estimation, and for a hypothetical case where correlated delays are set to zero Ratio of total position error without and with delay estimation Ratio of total position error without and with delay estimation in an extended network Ratio of total position error without and with delay estimation with large multipath error Ratio of total position error without and with delay estimation in an extended network with large multipath errors Example network layout Positioning error convergence for the receiver network example Objective function value for the receiver network example Convergence of the orthogonal basis for the distributed QR method Convergence of the orthogonal basis for the distributed power method Convergence of the eigenvectors for the distributed QR method Convergence of the eigenvectors for the distributed power method Convergence of the eigenvalues for the distributed QR method Convergence of the eigenvalues for the distributed power method Network from figure 2.9 in v 2 -v 3 space Final supernode placements in v 2 -v 3 space Final supernode placements in x-y space x

11 Chapter 1 Introduction 1

12 CHAPTER 1. INTRODUCTION 2 This thesis describes a distributed multigrid consensus algorithm, as well as applications of this algorithm to GPS augmentation and graph-spectrum computations. Distributed estimation algorithms are used to provide optimal estimates of a variable, based on a set of measurements taken by a network of sensors. Distributed estimation algorithms have several advantages and disadvantages compared to centralized algorithms. While centralized algorithms require the availability of a single processor that is capable of running the estimation algorithm, distributed methods divide the computational tasks into smaller tasks that can be performed by nodes with lower computational capabilities. For the algorithms described in this thesis, we assume that the sensors themselves have some computational capabilities and form the network of nodes that runs the distributed estimation algorithms. Distributed methods can also be more robust than centralized methods, in many cases making it possible to obtain good results even if some of the nodes or communication links in the network fail. The networks used here for distributed estimation can be modeled as graphs, where the sensors are the nodes or vertices of the graph, and the communication links between nodes form the edges. Every node can store some limited amount of data for later use, and thus is modeled as having a self-loop. We also assume that all communication links are two-way links, but that weights associated with different directions of transmission between two nodes do not have to be equal, making the network a directed graph. At every discrete time step, each node receives data from adjacent nodes, and updates its stored variables. Chapter 2 describes an algorithm for distributed consensus. While consensus is a very basic operation for a distributed network to perform, there are many complex computations that can be reduced to a combination of consensus steps and simple operations that can be performed by each node in the network individually. The consensus algorithm described in chapter 2 is different from other consensus algorithms in that it uses a multigrid network structure. Multigrid methods are a tool commonly used for improving convergence rates of algorithms for solving differential equations by using several levels of increasing resolution in the discretization. Chapter 2 shows how a multigrid structure can be created to run a consensus algorithm in a distributed

13 CHAPTER 1. INTRODUCTION 3 network. In addition, the performance and robustness trade offs of this algorithm are studied, and convergence rates and their dependencies on noise characteristics are compared to those of single level networks. Chapter 2 also proposes some extensions of the basic multigrid algorithm for measurement updates and assigning weights to the node measurements. Chapter 3 describes how distributed methods, including the multigrid algorithm from chapter 2, can be used to create a distributed GPS augmentation system. Traditional GPS augmentation systems use a reference station to create error corrections, which are broadcast to mobile receivers and used in point positioning. The augmentation system described here does not use a fixed reference receiver, but instead calculates correction terms based only on the measurements obtained from a network of mobile receivers. If distributed methods are used, the augmentation system also does not require the use of a centralized station to compute the corrections, since all computation is done by the network of receivers. Chapter 4 describes a distributed eigenvalue method for nonsymmetric matrices. Most eigenvalue methods are difficult to adapt to distributed systems due to their dependence on matrix factorization, but the algorithm presented here can be reduced to a series of consensus processes and simple computations, and can therefore be run on a distributed network. This is of particular interest since it can be used to find a worst-case estimate of the convergence rate of a consensus algorithm, and thus monitor the status of the network if the structure of the network changes over time. If the right number of levels is selected in the construction of the multigrid networks described in this thesis, the convergence rate scales logarithmically with network size, making them practical for use in very large networks. Since microcontrollers and microprocessors are included in a wide variety of devices, and wireless communication is becoming more and more ubiquitous, there are many potential application areas where distributed estimation and control could be applied to large networks. The main contributions of this thesis can be summarized as follows. A novel multigrid algorithm for distributed consensus is presented, along with analysis of the trade-offs between robustness and performance that occur when various parameters

14 CHAPTER 1. INTRODUCTION 4 are selected for this algorithm. The convergence of this consensus method is compared to other single-level methods under various noise conditions. Chapter 3 includes an algorithm for a distributed GPS augmentation system, which differs from existing augmentation systems in that it requires neither stationary reference receivers with known positions, nor reference stations for centralized computations. Chapter 4 extends an existing distributed eigenvalue method for symmetric matrices to nonsymmetric matrices, while also describing how the power method can be adapted for distributed systems. The spectral information of a network is then used for determining appropriate supernode locations in a network. A review of the relevant literature is provided in the introduction section of each of the individual chapters in this thesis. Conclusions and some ideas for future work are provided at the end of each chapter.

15 Chapter 2 Multiscale networks for distributed consensus algorithms 5

16 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS Introduction Distributed consensus algorithms [11][30][45] allow a network of computational nodes to iteratively exchange information between neighbors in order to compute the global average of a quantity. They can be used as the basis for many applications, such as distributed optimization methods [16] or control schemes [6][33]. While typically less efficient than a centralized algorithm, consensus methods have the advantages of distributing the work across all nodes in the network and of being robust to node and connection failure. The general framework for consensus methods considers each node synchronously updating its own value to a weighted average of the current values of its neighbors (as distinct from asynchronous gossip algorithms [3], for example). One of the most natural questions, therefore, is what graph structure and what weights should be chosen to give the fastest convergence of the algorithm to the consensus value while guaranteeing convergence [7]. The choice of optimal weights has been investigated in depth by [2][39][44], who used convex optimization and semidefinite programming to find the weights that minimize the magnitude of the second largest eigenvalue of the Markov chain defined by the consensus update. While such an approach gives the optimal choice of weights, it requires a centralized scheme for solving the optimization problem for the weights. An alternative to solving for the optimal weights is to choose a graph structure that gives fast consensus with some weight choice. This can be done by optimization [15], or by using a heuristic such as taking advantage of the fact that small-world networks [42] have fast consensus [40] and thus trying to add edges or nodes to enhance this property. Other possibilities include making the node updates random [19] or otherwise time-varying [20][29]. Networks can also have time-varying inputs or topologies due to the nature of the network rather than the consensus algorithm [31]. In this chapter we present an alternate scheme for producing a network to achieve fast consensus, based on the idea of multiscale networks. A simple example of a multiscale network with two levels is shown in figure 2.1. Figure 2.9 in section 2.7 shows a more complex network with three levels, that is used for numeric examples

17 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 7 Figure 2.1: Simple two-level network with five base-level nodes (gray) and two supernodes (black). The base-level nodes form a ring. throughout this thesis. We observe that a regular consensus method is similar to using Jacobi s method to solve the equation Lx = 0, where L is the graph Laplacian or a similar matrix. Unfortunately, the convergence rate of Jacobi s method is poor and scales badly as system size grows [36]. This is due to the fact that errors that vary slowly across the network are only slowly driven to zero by the Jacobi iteration, which uses only nearest-neighbor updates. One standard way of overcoming these deficiencies is to use multilevel algorithms, such as the multigrid method [41], where coarsened versions of the base-level graph are used to enhance the decay of slowly varying components. We build on this insight and give an algorithm for constructing multilevel networks for consensus. The basic multilevel network construction is presented in section 2.2, with a heuristic for adjusting the weights to enhance convergence in Section 2.4. An algorithm for adjusting the edge weights in the presence of node and edge failures is given in section 2.5 and the trade-off between performance and robustness is investigated numerically in section 2.6. Section 2.7 presents a numerical example for a randomly generated graph embedded in 2D. Section 2.8 describes how the algorithm can be used if node measurements are time-varying, and section 2.9 presents

18 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 8 equations for adjusting the algorithm for calculating a weighted average of node measurements. Finally, in section 2.10 we present some examples for the changes to the network spectral properties due to adding a multigrid structure, and how this influences correlations between noise spatial frequency and convergence rates. While the algorithm described in this chapter only finds the mean of a single variable, it can be used as a basis for performing many more complex computations. For example, the variance of a the node measurements can be found using a sequence of two consensus operations, the first to find the mean of the measurement values, and the second to sum the squares of deviations from the mean. Some applications, including the distributed GPS augmentation system described in chapter 3 and the spectral algorithm described in chapter 4 require adding vectors, which can be done by simply letting the state x of the network be a matrix, where each node stores the information contained in one row of the state matrix. 2.2 Construction of a multilevel network By a multilevel network we mean one where nodes are arranged in levels or classes. All nodes are not equal in their connection structures, but are grouped. In a spatially embedded network, lower levels contain more nodes and have physically short-range connections, while higher levels contain fewer nodes that have longer-range connections. This thus mimics the multiscale structure generated by multilevel algorithms such as multigrid [41]. We refer to nodes in all upper levels as supernodes, to distinguish them from the nodes in the base level. A consensus problem starts with a network with a set of nodes N and a set of edges E connecting these nodes. Each node is given an initial value, and the purpose of the consensus algorithm is to find the mean of the initial states of all nodes. The initial values of all nodes are stored in the vector x(0). Starting with the initial values, at any time step t each node i takes a weighted average of the state values of its neighboring nodes to compute its own new state value x i (t + 1). This process can

19 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 9 be represented as a multiplication with a state transition matrix P : x(t + 1) = P T x(t) (2.2.1) For a single-level network, Metropolis weights can be used to propagate the state as described in [45]. With Metropolis weights, the state transformation matrix is P ij = 1 1+max d i,d j if {i, j} E 1 {i,k} E P ik if i = j 0 otherwise. (2.2.2) This is equivalent to the evolution of probability distributions in Markov Chains and we assume irreducibility and aperiodicity so the state converges to a unique final state π, where P T π = π (2.2.3) State transition matrices that result from applying Metropolis weights are symmetric, and all row- and column-sums are equal to one. The invariant distribution is uniform, and represents the average of the initial states of the nodes: π = 1 n n x i (0) (2.2.4) i=1 Metropolis weights can be computed quickly by the distributed network, and can be efficient for single-level networks. However, they result in inefficiencies when applied to multilevel networks. In particular, Metropolis weights for connections between supernodes in upper levels of the network are smaller than they would need to be to maximize the convergence rate, since Metropolis weights take into account only the degree of a node, but not other aspects of the geometry of the network, such as the length of edges in a spatial embedding. One method for constructing multilevel networks and finding their state transition matrices and invariant distributions is to first generate the base level, and then add the upper levels. Each superior level is generated by making an identical copy of the

20 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 10 next lower level, and merging several nodes into supernodes. The nodes in each level are connected to their equivalent nodes in the levels directly above and below. This method can be used for constructing multiscale networks based on an arbitrary layout of the base-level network. It does however constrain the construction of the upper levels and the connections between levels, in that connections between supernodes must mirror the connections in the lowest level. It can therefore be applied in situations where the geometry of the upper levels of the network can be chosen to fit these constraints, or when the layers of supernodes are created by selecting some of the regular nodes in the base level to double as supernodes, and the base-level connections between nodes are also used to implement supernode edges. The first step in creating the multilevel network is duplicating the base level to create upper levels, and connecting each node to its corresponding node in the levels directly above and below, giving a so-called ladder network. The connections between different levels initially all have equal weights going up and down. For such a network, the invariant distribution of each level is equal to the invariant distribution π of the original base level, so that the overall invariant distribution is ˆπ = 1 n [ π T, π T,..., π T ] T (2.2.5) Next, weights are added for connections between different levels, so that the values a node receives from superior levels can be given more weight than those from inferior levels. Using coefficients α 1, α 2,..., α n to denote weights for connections between nodes in each level, and β 1, β 2,..., β n 1 for weights of connections between levels, the new state transition matrix is α 1 P (1 α 1 )I 0 0 β 1 I α 2 P (1 α 2 β 1 )I 0 ˆP = 0 β 2 I α 3 P 0 (2.2.6) α n P The merging of nodes to form q supernodes from p nodes in a level is described by the transformation matrix B i R p q, where B ij = 1 if and only if the original node

21 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 11 ˆP = α 1 P (1 α 1 )B β 1 B 2 α 2 B 2 P B 2 (1 α 2 β 1 )B 2 B β 2 B 3 B 2 α 3 B 3 P B (1 α n )B nb n 1 α n B np B n (2.2.9) Figure 2.2: Transition matrix for Theorem B denotes the pseudoinverse of B, i.e. B = ( B T B ) 1 B T. (α 1 1) β (α 2 + β 1 1) β (α 3 + β 2 1) (α n 1 + β n 2 1) β n γ 1 γ 1 + γ 2 γ 1 + γ 2 + γ 3.. n 1 i=1 γ i 1 = (2.2.10) Figure 2.3: Linear system for Theorem i is merged into supernode j. B i thus describes the mapping from the base level of nodes to level i. Note that B 1 = I. The transformation from the ladder network to the final network is described by ˆB = diag(b 1, B 2,..., B n ) (2.2.7) ( ) 1 ˆP = ˆBT ˆB ˆBT ˆP ˆB = ˆB ˆP ˆB (2.2.8) Theorem A multilevel network constructed as described above will have the state transition matrix (2.2.9) in figure 2.2 and invariant distribution ˆπ given by ˆπ = [ (γ 1 π) T, (γ 2 B2 T π) T, (γ 3 B3 T π) T,..., (γ n Bn T π) T ] T, (2.2.11) where the coefficients γ 1, γ 2,..., γ n are found by solving the linear system (2.2.10) shown in figure 2.3. Proof. Using ˆP from eq in eq yields the state transition matrix shown in figure 2.2. Given ˆP, we can show that ˆπ in equation is indeed the invariant

22 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 12 distribution: ˆP T ˆπ = α 1 γ 1 π + β 1 π ((1 α 1 )γ 1 + α 2 γ 2 + β 2 γ 3 )B2 T π. ((1 α i 1 β i 2 )γ i 1 + α i γ i + β i γ i+1 )Bi T π. ((1 α n 1 β n 2 )γ n 1 + α n γ n )Bn T π = ˆπ (2.2.12) Since the α and β coefficients are known, this can be written as a system of linear equations. Omitting the last row, which is redundant since each column of the original system sums to zero, and adding the condition that the sum of the γ s has to be one, we get (α 1 1) β 1 0 (1 α 1 ) (α 2 1) 0 0 (1 α 2 β 1 ) β n γ 1 γ 2 γ 3. γ n 1 γ n =. 0 1 (2.2.13) The linear system in figure 2.3 is constructed by taking the sum of each row except the last with all rows above it. As long as α i + β i 1 < 1 for all i, there is a unique solution. Given this solution, each node can determine the consensus value from the invariant distribution. The resulting invariant distribution is not uniform, and in order to determine the consensus value, the state of each node has to be multiplied with a factor that can be obtained by solving the linear equation above for the invariant distribution. 2.3 Invariant distribution offset factor determination through consensus As an alternative to solving the equations presented above for deriving the consensus value from the invariant distribution, the factors can also be found by using the

23 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 13 consensus method itself. This also makes it easy to relax the assumption that every node starts out with a measurement value. Until now it was assumed that each node had access to a unique measurement, but this might not be the case in some implementations of this method. In a real-life situation, a sensor could malfunction while the computation and communication capabilities of a node might be working normally. Another scenario where this occurs is when supernodes are implemented using the hardware of an already existing network. Theorem Let the elements of the vector κ be the factors the consensus value has to be multiplied with to obtain the invariant distribution, n ˆ π i = κ i x k (0) (2.3.1) k=1 Let x k (0) = 0 if node k has no measurement available for inclusion in the consensus process. Also, let η be a vector so that η i = 1 if node i has a measurement, and η i = 0 otherwise. Then κ can be found by applying the consensus method to η: κ = (P T ) η (2.3.2) Proof. Let m be the consensus value, which is the mean of all node measurement values: m = n k=1 x i(0) n k=1 η i The invariant distribution can be expressed in terms of κ and m as (2.3.3) ˆ π i = κ i m (2.3.4) Now, for the consensus process that uses η as the initial state vector ˆ π i = κ i n k=1 η i n k=1 η i = κ i (2.3.5)

24 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 14 In the case where every node has a measurement value, this simplifies to κ = (P T ) 1 (2.3.6) 2.4 Adjusting self-weights for improved performance The method for constructing the propagation matrix has one deficiency: since supernodes are constructed by merging several nodes in one level into one supernode, the self-weights of the supernodes are on average significantly higher than the weights for transmitting states between supernodes in a level. The convergence rate can be improved by reducing the self-weights of the supernodes, so that they are on average equal to the weights between nodes. This can be done by taking advantage of the fact that ( (1 + δ)p T δi ) π = P T π (2.4.1) Such an adjustment is applied to all submatrices that describe the connections between supernodes in their respective level, i.e. all block matrices on the diagonal of ˆP with the exception of the first block matrix on the diagonal, which describes the connections between the base-level nodes. The δ coefficients for each level are chosen such that the mean weight for connections between nodes are equal to the mean self-weights. Theorem If the multilevel network with state transition matrix ˆP in equation (2.2.9) has weight changes given by δ j = min { } a b 1 a b, min {diag(a j )} 1 min {diag A j } (2.4.2) a = 1 trace(a) (2.4.3) n 1 b = n 2 n A j 1 trace(a j ) (2.4.4) A j = B j P B j (2.4.5) then it will have the same invariant distribution as the unmodified network. With

25 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 15 ˆP a = α 1 P (1 α 1 )B 2 ) 0 0 β 1 B 2 α 2 ((1 + δ 2 )B 2 P B 2 δ 2 I (1 α 2 β 1 )B 2 B 3 0 ) 0 β 2 B 3 B 2 α 3 ((1 + δ 3 )B 3 P B 3 δ 3 I ) α n ((1 + δ n)b np B n δ ni (2.4.7) Figure 2.4: State transition matrix with adjusted supernode self-weights. these changes, the new state transfer matrix is equation (2.4.7) in figure 2.4. Proof. Adjusting the blocks on the diagonal of ˆP as described above does not change the products of those entries with the corresponding parts of the invariant distribution: ( ) (1 + δ i )B i P B i δ i I γ i Bi T π = B i P B iγ i Bi T π (2.4.6) Therefore, the invariant distribution ˆπ remains the same when the supernode selfweights are adjusted. While adjusting super-node self-weights does not necessarily result in optimal values for ˆPa, it is a heuristic that yields significant improvements in the spectral gap ρ = 1 λ 2. Figure 2.5 demonstrates the effect that adjusting supernode self-weights has on the convergence rate. For a ring-shaped network with three levels of nodes, three methods were used to construct the state transitions matrix: Metropolis weights, and the method described in the previous section with and without supernode self-weight adjustments. The computational cost of generating the networks was not taken into account here, since it is assumed that networks are used for multiple computations. Using the multigrid method, and initial improvement in the convergence rate compared to Metropolis could be achieved, as averaging of states of nodes connected to the same supernode is accelerated compared to Metropolis weights. However, since connections between supernodes are weak, convergence slows down after a few steps. With the improvement of adjusting supernode self-weights, a significantly higher convergence rate is achieved even after these initial steps.

26 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 16 Residual norm ri / r Metropolis weights not adjusted weights adjusted Time t Figure 2.5: Comparison of convergence for a ring network with three levels using Metropolis weights and the multigrid weights described here with and without supernode self-weight adjustments. 2.5 Adjusting the network for broken edges and nodes In order to be robust, the network should continue to function when one or more of its edges or nodes stop functioning, as long as the network is still connected. A broken node is a special case of multiple broken edges, since it is equivalent to breaking all edges of the effected node and removing it from the network. One simple method for adjusting for a broken edge is for the adjacent nodes to modify their self-weights so that the row sums of the weight matrix are again equal to 1. Affected nodes only need to know the weights of their remaining edges to do this. When this method is used, the invariant distribution does not change, as long as the network is still connected. This can be shown by considering the joint probability matrix W, where W ij = P ij π i (2.5.1)

27 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 17 The column sums of W are equal to the invariant distribution: W ji = π i (2.5.2) j When the edge between nodes p and q is broken, P is adjusted in the following way: P pq = P qp = 0 (2.5.3) P pp = P pp + P pq (2.5.4) P qq = P qq + P qp (2.5.5) This results in the following adjustments to W: W pq = W qp = 0 (2.5.6) W pp π p W qq π q = W pp π p = W qq π q + W pq π p (2.5.7) + W qp π q (2.5.8) These adjustments preserve the symmetry of W. The column sums of W are: W ji = W ji = π i = π i for i p, q (2.5.9a) j j W jq = W ji + W pp + W qp j j p,q = W jp + (W pp + W pq ) π p for i = p (2.5.9b) π p j p,q W jq = W ji + W qq + W pq j j p,q = j p,q W jq + (W qq + W pq ) π q π q for i = q (2.5.9c)

28 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 18 Theorem If the multilevel network with state transition matrix ˆPa in equation (2.4.7) has some edges removed but remains connected, then updating the transition matrix by (2.5.3) (2.5.5) ensures that the invariant distribution remains unchanged. Proof. Equations (2.5.9b) and (2.5.9c) can be solved for π p and π q: π p = (π p W pp W qp )π p π p W pp W qp = π p (2.5.10) π q = (π q W qq W pq )π q π q W qq W pq = π q (2.5.11) Therefore, π i = π i for all i. If the network becomes disconnected as a result of broken edges, or if one or more nodes break, the resulting invariant distribution of the remaining or partial network is not the same as that of the original network, since information is lost in the process. However, the method for adjusting the network described above can still be used to determine the average of the values of the remaining nodes at the time the network was disconnected. 2.6 Performance and Robustness trade-offs There are many useful measures of performance for consensus algorithms. One such performance measure is the second largest eigenvalue modulus (SLEM) [34][24]. The SLEM is a measure of the worst-case convergence rate, which applies if the initial guess is aligned with the second eigenvector, or the convergence rate that is reached when all differences in node states along other eigenvectors of the system are sufficiently reduced. Figure 2.6 shows the spectral gap ρ = (1 SLEM) for multilevel networks with various numbers of levels, where the nodes and edges within each level form a ring. In these networks, every node is connected to its two neighboring nodes within its level, so that each level forms a ring. In addition, each node is connected to one supernode

29 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS Spectral gap ρ N=6 N=5 N=4 N=3 N=2 N= Number of Nodes n Figure 2.6: Spectral gap vs. number of nodes n in the base level for networks with various numbers of levels N. in the level above. Each supernode in one of the upper levels has the same number of subnodes. As demonstrated in the figure, the spectral gap for a single-level network is inversely proportional to the square of the number of nodes in the network. However, if the number of levels in the network is allowed to vary and is sufficiently large, it scales logarithmically instead. One simple measure of robustness is the connectivity of the network. Additional measures of robustness are necessary to evaluate how the network convergence rate is affected by failures of some edges or nodes that do not lead to parts of the network becoming disconnected. One such measures of robustness is the worst-case spectral gap of a network with a specific number of broken edges or nodes. Another measure of performance that can be used is the inverse of the number of steps t c required for convergence of node values to within a small error margin of the invariant distribution. Similarly, robustness can be defined as the ratio between the number of steps required for convergence for the intact network and for a network with a number of broken edges or nodes.

30 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 20 Time to convergence tc Worst case single node failure Intact network Number of Supernodes n 2 Figure 2.7: Centralization Robustness vs. Performance trade-off Single Node failure worst case performance. In constructing a multilevel network, there are a number of parameters one can chose that influence the performance and robustness of the network. The extreme cases are often equivalent to a single level distributed network, which is very robust but has low performance, or to a network with a single supernode, which has high performance and low robustness. The first choices to make are the number of levels and the ratio of nodes per supernode for each level. The effects of the number of levels on the SLEM for a ringshaped network are shown in figure 2.6. Figure 2.7 shows an example of the number of time steps required for convergence of a ring-shaped network with two levels and 40 base nodes as a function of the number of supernodes n 2 in the second level. In the case where all nodes are functioning, the convergence rate is lower for networks with more supernodes. However, if any one of the supernodes breaks, the time to convergence increases dramatically for a network with few supernodes, while networks with more supernodes are not affected as much. In this case, adding more than six supernodes to a network does not lead to faster convergence if one of them breaks, since the effect of lowering the convergence rate is larger than the benefit of added

31 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS Pareto frontier Coefficients used for examples below Robustness tc/t c Performance 1/t c Figure 2.8: Performance vs. Robustness for various α and β values. The red cross indicates the parameter values chosen for subsequent numerical examples. robustness. However, if several nodes malfunction, having additional supernodes can be beneficial. While the ideal number of levels depends primarily on the number of nodes in the network, the best ratio between the number of nodes in different levels depends on the expected failure rate of nodes and edges, as well as the desired level of robustness. Additional parameters that have to be chosen are the α and β coefficients in the state transition matrix ˆP (figure 2.2). Selecting large values for the coefficients that govern data exchanges between supernodes and from supernodes to base nodes yields high performance and lower robustness, while giving base level nodes more weight increases robustness and lowers performance. Figure 2.8 shows the Pareto frontier of all possible combinations of these coefficients for a ring-shaped network with three levels and 64 base-level nodes. The times to convergence for the intact network and for a network with ten broken edges were used to evaluate performance and robustness. The majority of possible combinations of the four α and β parameters in this case are not on the Pareto frontier and should not be selected. Each point on the Pareto frontier represents a different performance-robustness trade-off, and selection

32 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS Figure 2.9: Example network layout (connections between different levels are not shown), where upper levels use larger nodes and thicker edges. of a specific parameter combination depends on the desired level of performance or robustness. 2.7 Two dimensional numerical example To demonstrate how the algorithm described above might be used in a real network, a two dimensional network consisting of 324 randomly positioned nodes was created. The probability of having an edge between any two nodes in the base level was inversely proportional to the square of the distance between the nodes. Two supernode levels were created by dividing the base level layout into a 6 6 grid for the second level, and a 2 2 grid for the third level, and selecting the node closest to the center of each grid square to double as a supernode. The layout of this network is shown in figure 2.9. Figure 2.10 shows the convergence of the node values to the invariant distribution

33 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS Residual norm ri / r Single level with broken edges Single level Metropolis Three levels with broken edges Three level multigrid Time t Figure 2.10: Convergence results for the example network. for both the multigrid network and a network consisting of the base level only. As expected, the multigrid network converges significantly faster. Also plotted is a case where all edges have a probability of being functional of 0.5 at any time step. While this decreases the convergence rate, the multigrid network still performs significantly better than the single level network. 2.8 Measurement updates The method described in the previous sections is applicable to situations where each node takes only one measurement. In many potential applications, the value that is being estimated changes over time, and nodes update their measurements periodically. One option to handle measurement updates would be to restart the consensus process with each new set of sensor measurements. However, this can be ineffective, especially if variations between different sensors are larger than variations of a particular sensor s values over time, since all progress towards consensus based on the previous values would be discarded. In addition, it would require all nodes to perform measurement updates at the same prearranged time, and would not allow for

34 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 24 unscheduled asynchronous updates. The following theorem describes a way of updating the state of a node to incorporate a new measurement without restarting the consensus process. It can also be applied if only some or just a single node update their measurements, and unlike restarting the consensus process, it does not require any synchronized action between nodes. The only disadvantage of this method is that nodes need to store their previous measurement values in addition to their current state. Theorem Let y be a vector of previous measurement values, and let y be a vector of updated measurement values. Update the state vector as follows: x = x + (y y) (2.8.1) Then the new invariant distribution reflects the mean of the new measurement values, i.e. ˆ π = P y (2.8.2) Proof. If the measurement update is performed at time t, then the node state before and after the measurement update are x(t) = (P T ) t y (2.8.3) The new invariant distribution is x (t) = (P T ) t y + (y y) (2.8.4) ˆ π = P x = P (P T ) t y + P y P y = P y (2.8.5) 2.9 Sensor weights The methods described above lead to a consensus that reflects the mean of the measurement values of all nodes. In this section, we describe how to adapt the methods

35 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 25 to allow for giving nodes unequal weights, so that nodes that have access to more accurate measurements can be given higher weights than nodes with less accurate measurements. Theorem Let y i be a measurement value associated with node i and let φ i be the weight assigned to it. Then the weighted average of the measurements of all nodes in the network can be found by running two separate consensus processes on variables x and z with initial values as defined below: x i (0) = φ i y i (2.9.1) z i (0) = φ i (2.9.2) The weighted average of y i is obtained at each node after both consensus processes converge by dividing x i by z i. x i ( ) z i ( ) = n k=1 φ ky k n k=1 φ k (2.9.3) Proof. Applying equation 2.3.1, x i (t) = κ i z i (t) = κ i n x k (0) (2.9.4) k=1 n z k (0) (2.9.5) k=1 The factors κ are the same for both consensus processes. x i ( ) z i ( ) = κ n i k=1 φ ky k κ n i k=1 φ k = n k=1 φ ky k n k=1 φ k (2.9.6) With this method it is even possible to alter sensor weights from φ i to new values φ i at some time during the consensus process by using the method described in the

36 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 26 previous section and applying equation to both x and z, i.e. x i = x i + (φ iy i φ i y i ) (2.9.7) z = z i + (φ i φ i ) (2.9.8) Note that the sensor weights do not need to sum to one here, since we divide by their sum z. This is particularly useful, since it means that a node can change its weight by simply altering its own stored values of x i and z i, and no additional interaction with other nodes is required. If the noise in the node measurements is expected to be independent for each node and normally distributed, setting the sensors weights equal to the inverse of the variance σi 2 of each node minimizes the overall error: φ i = 1 σ 2 i (2.9.9) In some situations, the nodes might be able to provide an estimate of the accuracy of their estimate that varies over time. The equations above can then be used to update the node weights to reflect this change in the estimated accuracy Network spectral properties and convergence rates In this section we study how spectral properties of a network are influenced by adding supernode levels to a network. Results presented in previous sections have shown that multigrid methods can reduce the second-largest eigenvalue λ 2 and thereby increase the spectral gap ρ = 1 λ 2 of a network. To show the effects on additional eigenvalues and eigenvectors of the network, we start with the example of a ring-shaped network, where the base level network consists of a simple ring of nodes, and every node has exactly two neighbors. For such a ring-shaped network with n nodes, the eigenvalues are given by the following expression, where k takes values from 0 to n/2 for even n,

37 CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 27 and from 0 to (n 1)/2 for odd n: λ 2k 1 = cos ( k n 2π ) (2.10.1) For k larger than 0 and smaller than n, the multiplicity of the eigenvalue is 2. The eigenvectors have the following forms, where v k,i is the i-th entry of the k-th eigenvector, and the vectors v have to be normalized to obtain the eigenvectors v: ( ) ki v 2k 1,i = sin n 2π ( ) ki v 2k,i = cos n 2π (2.10.2) (2.10.3) The eigenvalue moduli for a ring-shaped network of 400 nodes are shown in figure Three of the eigenvectors are shown in figure Overall, for a ring-shaped network, eigenvectors corresponding to eigenvalues with high moduli are low frequency sinusoids, and eigenvectors corresponding to low modulus eigenvalues are high frequency sinusoids. If a consensus process is run on such a network, high-spatial-frequency noise therefore is averaged out quickly, while low-spatial-frequency noise persists for a larger number of time steps. Figure 2.11 also shows the eigenvalues of multigrid networks that use the simple ring-shaped network as their base layer. The eigenvalues shown are for networks with three and six layers. Both multigrid networks have the same number of nodes in the top level, so that the three layer network represents a relatively centralized network, and the six layer network represents a more robust network. As expected, the eigenvalues of the multigrid network are lower in magnitude than those of the single layer network, with the three level network having the overall smallest eigenvalues. Most importantly, λ 2 is significantly lower for the multigrid networks. To study how the eigenvalues and eigenvectors relate to convergence rates, a consensus algorithm was run on the ring-shaped network with the eigenvectors of the single level ring as an input. For each of the eigenvectors, the process was started with each node initialized to the corresponding entry of the eigenvector, and the

Asymptotics, asynchrony, and asymmetry in distributed consensus

DANCES Seminar 1 / Asymptotics, asynchrony, and asymmetry in distributed consensus Anand D. Information Theory and Applications Center University of California, San Diego 9 March 011 Joint work with Alex