Distributed Data Fusion with Kalman Filters. Simon Julier Computer Science Department University College London

Distributed Data Fusion with Kalman Filters Simon Julier Computer Science Department University College London S.Julier@cs.ucl.ac.uk

Structure of Talk Motivation Kalman Filters Double Counting Optimal Distributed Data Fusion Suboptimal Distributed Data Fusion Probabilistic Interpretation of WGMs Summary 2

Motivation Motivation Kalman Filters Double Counting Optimal Distributed Data Fusion Suboptimal Distributed Data Fusion Probabilistic Interpretation of WGMs Summary 3

Wilderness Search and Rescue 4

Wilderness Search and Rescue The priority is search which is fast and safe You have to find somebody before you can rescue them Time is often of the essence Safety of searchers must be ensured UAVs are an ideal tool to use: Rapidly collect data from wide area Can go where it s dangerous Automated platforms can operate even faster 5

UAV Search Task Fly around the environment Control based on prior distribution of target location Detect potentially interesting objects Localise potentially interesting objects Identify potentially interesting objects 6

Autonomous Platforms 7

Raw Sensor Data Returned 8

Target Localisation Estimate the state of a target in an area of interest using multiple UAVs UCL March 2009 9

Tracking with Multiple UAVs 10

Goals of Lecture What causes the algorithms to fail so badly? How can we solve these problems optimally, and what constraints do they place on the solution? How can we solve the problem suboptimally, and what constraints do they pose on the problem? 11

Kalman Filters Motivation Kalman Filters Double Counting Optimal Distributed Data Fusion Suboptimal Distributed Data Fusion Probabilistic Interpretation of WGMs Summary 12

System Description Let the state of the system at time step i be given by the state space vector Both the process and observation models are linear and are of the form: 13

Structure of the State Estimate The estimate consists of the tuple: where: is the mean vector; really a single numerical estimate a bit like MAP is the covariance ; really measure of mean squared error 14

Valid Estimates We need to have a criteria which says whether our system works or not Since we only have a state vector and a covariance matrix, our criteria of success is that our state is covariance consistent 15

Covariance Consistency We say that an estimate is conservative if it overestimates the actual mean squared error in the estimate, where 16

Matched 17

Conservative 18

Consistent 19

Inconsistent 20

Biasedness Authors often stipulate that the error needs to be zero-mean However, this is overly restrictive Any system with modelling errors cannot be consistent under this definition Since all models of the real world are wrong, no filter can be consistent 21

Biasedness Suppose Then 22

Biasedness Therefore, for the estimate to be consistent, 23

Matched Biased 24

Non-Gaussian Noise Models The definition similarly applies if the distribution is non-gaussian In this case, we just compute the moments of the distribution 25

Non-Gaussian Noise Models 26

Comparison with Entropy We could argue that instead of covariance consistency, we could use a measure like entropy However: To compute entropy we have to assume a distribution (e.g., Gaussian) Even with this assumption, it won t tell us if our estimate is too small 27

Entropy-Matched Distributions 28

First Iteration of the Kalman Filter Cycle Initialise Predict Update Observation 29

Prediction Step Under the assumption that the errors are small, linearised prediction equations are used 30

Kalman Filter Update Step We (arbitrarily) decide a linear update rule of the form where is the innovation vector

Kalman Filter Update Step The weight matrix is chosen to minimise the trace of the updated covariance matrix It can be shown that this is and this gives a covariance update equation of the form

Debiased Coordinate Conversions A sensor measures the range r and bearing to a target The aim is to estimate the (x, y) coordinates of the target The measurement model is, of course: 33

Debiased Coordinate Conversions 34

Debiased Coordinate Conversions in Action 35

[Mapping Example]

Extended Kalman Filter A common assumption is that the errors are small Therefore, the first two moments are approximated as Various kinds of analytical and numerical moment approximations are widely used as well

Kalman Filters Motivation Kalman Filters Double Counting Optimal Distributed Data Fusion Suboptimal Distributed Data Fusion Probabilistic Interpretation of WGMs Summary 38

Distributed Fusion Kalman Filter Cycle Initialise Predict Update Local Observation Broadcast Estimate Update Remote Estimate Observation Other Nodes Additional steps due to distribution 39

Basic Idea of DDF Each platform maintains its own estimate of the target state, Each node runs a Kalman filter locally and fuses locally taken measurements The update is distributed to other nodes which fuse with it 40

However, There Is A Slight Complication The state information stored in each node is not independent of the information in other nodes Common process noise Occurs whether or not nodes have exchanged information Common measurement history Occurs when nodes exchange information Assuming that state estimates are independent of one another is bad However, often used in so-called weak coupling of, say, GPS and INS systems 41

Dependent Information and Information Sets Each node collects its own set of data, which is independent of the other node

Fusion of Information Sets Estimates (information sets) exchanged between nodes

Fusion of Independent Information Sets

When Do Independent Sets Arise? Independent sets arise when the information set to each node is conditionally independent This can only happen if you can guarantee: The target is stationary The poses of the UAVs are known perfectly The same observation information is only ever used once

Multiple Platform Fusion

Dependent Information Sets Both sets now contain common information; not conditionally independent

Fusion of Dependent Information Sets New information Common information New information

Assuming Conditional Independence Double counted term

Double Counting in State Space Form Within a Kalman filter, the dependency is through the values of the cross correlations These can be evaluated by considering the state of the entire network, including the joint state of all platforms, all objects being tracked, etc. 50

Double Counting in State Space Form The full covariance structure of this estimate is

Double Counting in State Space Form However, if we only maintain the marginals for each platform separately,

Double Counting in State Space Form The error in the approximate covariance matrix is Therefore, we are using an inconsistent approximation of our network and our system will fail 53

Assuming State Estimates Are Independent

Optimal Distributed Data Fusion Motivation Kalman Filters Double Counting Optimal Distributed Data Fusion Suboptimal Distributed Data Fusion Probabilistic Interpretation of WGMs Summary 55

The Right Way to Solve the Problem Chong and Mori showed that this can be implemented with a modified form of Bayes Rule, Cancel out common information This cancels out the common information between nodes This can only be computed locally with special network topologies

Approach 1: Distribute Observations Broadcast all observations to all nodes 57

Pros and Cons Advantages: Each node has optimal estimate for all time Distribution provides no additional complexity to fusion algorithm Actually used in practice Disadvantages: Requires all nodes to have the same communication and computational abilities Requires extremely large bandwidth Introduces implicit assumption that all nodes have exactly the same estimate 58

Approach 2: Fully-Connected Network Broadcast all updated state estimates to all nodes 59

Fully-Connected Networks The easiest way to implement a fully connected network is to use the inverse covariance (or information) form of the Kalman Filter The state space is replaced by the information variables 60

Updating in Information Form Using information form, the update simplifies to where the information from the observations is 61

Distributed Information Updates Since the information from the observations is independent of the state, i n and I n are independent of previous state estimates and can be safely distributed The update rule simply becomes 62

Fully-Connected Network Advantages: Each node has optimal estimate for all time Broadcasting the observation information variables potentially saves bandwidth Disadvantages: Requires all nodes to have the same communication and computational abilities Still requires O(N 2 ) communication links Introduces explicit assumption that all nodes have exactly the same estimate (important if linearising e.g., with an EKF) 63

Approach 3: Hierarchical Network Network has master and slave nodes Slaves fuse data locally Estimates sent to master which fuses them together Revised estimate broadcast back to slaves 64

Fusion in the Slave The slave updates using the information Kalman filter equations: 65

Fusion in the Master The master updates by summing the information from all the slaves To compensate for the prediction which was sent out, the master must subtract out common information, 66

Hierarchical Network Advantages: Each node has optimal estimate for all time The number of communication links is O(N) Disadvantages: Additional latency One node is privileged; failure of that node causes the whole network to fail 67

Approach 4: Channel Filters Constrain the network to be a tree Single path between any pair of nodes Use channel filters to subtract off common information 68

Estimating Common Information Consider a link between a pair of nodes i and j The channel filter maintains common information across the link It has its own information estimate, 69

Updating Local Nodes The Channel Filter is a regular Kalman Filter but works with the information exchanged between i and j rather than the observation data directly First, let the update at filter i using the local sensor observations be written as 70

Fusing With Nearby Nodes The updated estimate is given by summing all the independent information from a node s neighbours, 71

Updating the Channel Filters The channel filter update is given by recursively updating with the difference in information variables from the two nodes, 72

Channel Filters in Action 73

Advantages and Disadvantages Advantages: The number of communication links is O(N) Optimal in a time-delayed sense Disadvantages: Estimates at all nodes differ Single path of communication; no redundancy If the network is reconfigured, the channel filters have to be recalculated from scratch Global time synchronisation 74

Hybrid Architectures Channel filters can be mixed-and-matched with other local topologies like observation distribution or master-slave 75

Review of Techniques So Far It is possible to develop optimal algorithms for distributed data fusion using local message passing only However, these techniques rely on special network topologies: Fully connected Tree-connected In general, preserving these topologies can be difficult and undesirable 76

Adhoc Network Arbitrary network with loops and cycles Complete flexibility and redundancy 77

Distributed Data Fusion in Adhoc Networks However, it has been shown that no local data fusion scheme can be used to develop consistent, optimal estimates in this situation Therefore, it appears that DDF is strongly limited to the case of very particular data fusion architecture Alternative approach: can we develop mathematically rigorous suboptimal solutions? 78

Suboptimal Distributed Data Fusion Motivation Kalman Filters Double Counting Optimal Distributed Data Fusion Suboptimal Distributed Data Fusion Probabilistic Interpretation of WGMs Summary 79

Double Counting in State Space Form Recall again that the problem is that we want to know the full joint covariance However, we only know the marginalised form 80

Double Counting in State Space Form From knowledge of the marginals alone, it is not possible to reconstruct the full joint covariance matrix However, because the joint covariance matrix must be positive semidefinite, we know there are constraints on what the cross correlations look like Therefore, we can exploit these constraints to develop update rules which are consistent for any feasible cross correlation 81

The Kalman Filter with Correlated Noise First consider the case that the observation noise is not independent of the filter state, It can be shown that the expectations become

Properties of Updated Covariances 83

Applying the Results to The update which generates a family of ellipses which circumscribe the intersection region is given by This is the same as a Kalman filter update, but with 84

Covariance Intersection 85

Choosing The free parameter is used to trade-off between the prediction and the observation It should be chosen to minimise some measure of uncertainty in the estimate Trace Determinant (better) The optimisation is convex and so many simple solver algorithms can be used Some closed form solutions have been developed as well 86

Covariance Intersection In Action 87

Probabilistic Interpretation of WGMs Motivation Kalman Filters Double Counting Optimal Distributed Data Fusion Suboptimal Distributed Data Fusion Probabilistic Interpretation of WGMs Summary 88

Probabilistic Interpretation Great, so this works with means and covariances - but is it actually doing something valid from a probability distribution point of view? It turns out, that a generalisation of CI is equivalent to computing the weighted geometric mean,

WGM Does Not Double Count Single counted term

Structure of the Fusion Rule The fusion rule has the form where Function of new information Common information single-counted

Information Losses and Gains Therefore, we now need to ask what is the effect of We can assess this in several ways: By observation Pointwise bounds Information measures Surprisingly hard; still a work in progress

Example Distributions

Effect of

Pointwise Bounds It is possible to establish pointwise bounds which apply at each point in the distribution Although pointwise bounds play no special role in Bayesian statistics, they provide some insight into the behaviour of the fusion rule

Bounds for the Unnormalised Distribution Let This is always squeezed between the distributions,

Illustration of the Unnormalised Bound

Lower Bound Consider the distribution where The WGM obeys the lower bound

Illustration of the Lower Bound

Interpreting the Lower Bound The minimum value of a distribution plays no special role in Bayesian statistics However, the bound from below Avoids degenerate cases The support has to contain the intersection of the supports of the prior distributions Lower bounds on distributions often play a role in practical filtering algorithms Truncate distributions or modes in MHT if the probability is too small

Upper Inequality There can exist an x such that The fact that the distribution can exceed the maximum suggests that fusion can occur The distribution becomes more concentrated

Illustration of the Upper Inequality

Updated Distribution

Summary Motivation Kalman Filters Double Counting Optimal Distributed Data Fusion Suboptimal Distributed Data Fusion Probabilistic Interpretation of WGMs Summary 109

Summary Distributed data fusion is important for many applications However, estimates are not conditionally independent Optimal solutions can be used in just limited circumstances Suboptimal algorithms can be used more widely The KF is more than Bayes with Gaussians! 110