Anytime Planning for Decentralized Multi-Robot Active Information Gathering

Similar documents
Non-Myopic Target Tracking Strategies for Non-Linear Systems

Decentralized Active Information Acquisition: Theory and Application to Multi-Robot SLAM

10 Robotic Exploration and Information Gathering

Multi-Robotic Systems

Partially Observable Markov Decision Processes (POMDPs)

Planning Periodic Persistent Monitoring Trajectories for Sensing Robots in Gaussian Random Fields

OPTIMAL FUSION OF SENSOR DATA FOR DISCRETE KALMAN FILTERING Z. G. FENG, K. L. TEO, N. U. AHMED, Y. ZHAO, AND W. Y. YAN

Luiz F. O. Chamon, George J. Pappas and Alejandro Ribeiro. CDC 2017 December 12 th, 2017

Decentralized Stabilization of Heterogeneous Linear Multi-Agent Systems

Gradient Sampling for Improved Action Selection and Path Synthesis

Distributed Data Fusion with Kalman Filters. Simon Julier Computer Science Department University College London

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Explorability of a Turbulent Scalar Field

Nonlinear Optimization for Optimal Control Part 2. Pieter Abbeel UC Berkeley EECS. From linear to nonlinear Model-predictive control (MPC) POMDPs

A Multi-Robot Control Policy for Information Gathering in the Presence of Unknown Hazards

= m(0) + 4e 2 ( 3e 2 ) 2e 2, 1 (2k + k 2 ) dt. m(0) = u + R 1 B T P x 2 R dt. u + R 1 B T P y 2 R dt +

Coarticulation in Markov Decision Processes

Formation Control of Nonholonomic Mobile Robots

Manipulators. Robotics. Outline. Non-holonomic robots. Sensors. Mobile Robots

Simultaneous Localization and Mapping

Towards control over fading channels

SENSOR SCHEDULING AND EFFICIENT ALGORITHM IMPLEMENTATION FOR TARGET TRACKING. Amit Singh Chhetri

1 Kalman Filter Introduction

Decision Procedures An Algorithmic Point of View

Provably Correct Persistent Surveillance for Unmanned Aerial Vehicles Subject to Charging Constraints

Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation

IMPROVED MPC DESIGN BASED ON SATURATING CONTROL LAWS

Physics-Aware Informative Coverage Planning for Autonomous Vehicles

Optimal Temporal Logic Planning in Probabilistic Semantic Maps

CS Algorithms and Complexity

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Multi-Robot Active SLAM with Relative Entropy Optimization

Design of Norm-Optimal Iterative Learning Controllers: The Effect of an Iteration-Domain Kalman Filter for Disturbance Estimation

Discrete Double Integrator Consensus

Lecture Notes: (Stochastic) Optimal Control

SLAM Techniques and Algorithms. Jack Collier. Canada. Recherche et développement pour la défense Canada. Defence Research and Development Canada

A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games

Bubble Clustering: Set Covering via Union of Ellipses

Decentralized Goal Assignment and Trajectory Generation in Multi-Robot Networks: A Multiple Lyapunov Functions Approach

Vlad Estivill-Castro. Robots for People --- A project for intelligent integrated systems

Target Localization and Circumnavigation Using Bearing Measurements in 2D

Estimation of State Noise for the Ensemble Kalman filter algorithm for 2D shallow water equations.

EXPERIMENTAL ANALYSIS OF COLLECTIVE CIRCULAR MOTION FOR MULTI-VEHICLE SYSTEMS. N. Ceccarelli, M. Di Marco, A. Garulli, A.

Online Passive-Aggressive Algorithms

Convergence Analysis of the Variance in. Gaussian Belief Propagation

Learning Manipulation Patterns

Distributed Receding Horizon Control of Cost Coupled Systems

Optimal Control of Multi-battery Energy-aware Systems

ADMM and Fast Gradient Methods for Distributed Optimization

Communication constraints and latency in Networked Control Systems

A decentralized Polynomial based SLAM algorithm for a team of mobile robots

An Adaptive Clustering Method for Model-free Reinforcement Learning

A Guaranteed Cost LMI-Based Approach for Event-Triggered Average Consensus in Multi-Agent Networks

Mixed-Integer Nonlinear Programming Formulation of a UAV Path Optimization Problem

Partially Observable Markov Decision Processes (POMDPs)

E190Q Lecture 11 Autonomous Robot Navigation

Reachability-based Adaptive UAV Scheduling and Planning in Cluttered and Dynamic Environments

L11. EKF SLAM: PART I. NA568 Mobile Robotics: Methods & Algorithms

A strongly polynomial algorithm for linear systems having a binary solution

Self-triggered and team-triggered control of. networked cyber-physical systems

Mobile Robot Localization

Efficient Information Planning in Graphical Models

Economic and Distributed Model Predictive Control: Recent Developments in Optimization-Based Control

Isobath following using an altimeter as a unique exteroceptive sensor

MAE 598: Multi-Robot Systems Fall 2016

Coordinated Path Following for Mobile Robots

THIS work is concerned with motion planning in Dynamic,

Passivity-based Formation Control for UAVs with a Suspended Load

Robust Adaptive Motion Planning in the Presence of Dynamic Obstacles

ESC794: Special Topics: Model Predictive Control

Stochastic Target Interception in Non-convex Domain Using MILP

Experimental validation of a decentralized control law for multi-vehicle collective motion

arxiv: v1 [cs.sy] 4 Mar 2014

Non-Collision Conditions in Multi-agent Robots Formation using Local Potential Functions

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

CMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro

Statistical Visual-Dynamic Model for Hand-Eye Coordination

Formation path following control of unicycle-type mobile robots

Decentralized Decision Making for Multiagent Systems

Distributed MAP probability estimation of dynamic systems with wireless sensor networks

Learning Dexterity Matthias Plappert SEPTEMBER 6, 2018

Vision-based range estimation via Immersion and Invariance for robot formation control

Distributed Optimization over Networks Gossip-Based Algorithms

Weak Input-to-State Stability Properties for Navigation Function Based Controllers

Parameter Estimation in a Moving Horizon Perspective

Adaptive Nonlinear Model Predictive Control with Suboptimality and Stability Guarantees

CMU Lecture 4: Informed Search. Teacher: Gianni A. Di Caro

An efficient approach to stochastic optimal control. Bert Kappen SNN Radboud University Nijmegen the Netherlands

A Hierarchy of Suboptimal Policies for the Multi-period, Multi-echelon, Robust Inventory Problem

Distributed Optimization. Song Chong EE, KAIST

Convex receding horizon control in non-gaussian belief space

Implan: Scalable Incremental Motion Planning for Multi-Robot Systems

Sparse Covariance Selection using Semidefinite Programming

Temporal-Difference Q-learning in Active Fault Diagnosis

MDP Preliminaries. Nan Jiang. February 10, 2019

Motion Planning for LTL Specifications: A Satisfiability Modulo Convex Optimization Approach

An Optimization-based Approach to Decentralized Assignability

QUALIFYING EXAM IN SYSTEMS ENGINEERING

Resilient Submodular Maximization For Control And Sensing

STATE ESTIMATION IN COORDINATED CONTROL WITH A NON-STANDARD INFORMATION ARCHITECTURE. Jun Yan, Keunmo Kang, and Robert Bitmead

Transcription:

Anytime Planning for Decentralized Multi-Robot Active Information Gathering Brent Schlotfeldt 1 Dinesh Thakur 1 Nikolay Atanasov 2 Vijay Kumar 1 George Pappas 1 1 GRASP Laboratory University of Pennsylvania Philadelphia, PA, USA 2 Department of Electrical and Computer Engineering University of California San Diego San Diego, CA, USA May 17, 2018 Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 1 / 25

What is the problem? Suppose the robot at x 0 wants to plan a path to x T to track a target y T Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 2 / 25

Overview 1 Active Information Gathering Problem Statement Reduction to Open-Loop Control, and Optimal Solution (ɛ,δ)-reduced Value Iteration Anytime Reduced Value Iteration 2 Multiple Robots Coordinate Descent Distributed Estimation Application: Target Tracking Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 3 / 25

Problem Statement Consider a team of n mobile robots, with sensor motion models: x i,t+1 = f i (x i,t, u i,t ), i {1,..., n} (1) where x i,t X, u i,t U, and U is a finite set. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 4 / 25

Problem Statement Consider a team of n mobile robots, with sensor motion models: x i,t+1 = f i (x i,t, u i,t ), i {1,..., n} (1) where x i,t X, u i,t U, and U is a finite set. The robots goal is to track a target y t with target motion model: y t+1 = Ay t + w t, w t N (0, W ) t = 0,..., T (2) Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 4 / 25

Problem Statement Consider a team of n mobile robots, with sensor motion models: x i,t+1 = f i (x i,t, u i,t ), i {1,..., n} (1) where x i,t X, u i,t U, and U is a finite set. The robots goal is to track a target y t with target motion model: y t+1 = Ay t + w t, w t N (0, W ) t = 0,..., T (2) The robots are able to sense the target using on-board sensors obeying the following sensor observation model: z i,t = H(x i,t )y t + v t (x i,t ), v t N (0, V (x i,t )) (3) Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 4 / 25

Problem Statement Consider a team of n mobile robots, with sensor motion models: x i,t+1 = f i (x i,t, u i,t ), i {1,..., n} (1) where x i,t X, u i,t U, and U is a finite set. The robots goal is to track a target y t with target motion model: y t+1 = Ay t + w t, w t N (0, W ) t = 0,..., T (2) The robots are able to sense the target using on-board sensors obeying the following sensor observation model: z i,t = H(x i,t )y t + v t (x i,t ), v t N (0, V (x i,t )) (3) Note the Linear, Gaussian assumptions on the target and observations, but not on the robot. The set U of admissible controls is finite. To simplify notation, we refer to the state of the team as x t at time t. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 4 / 25

Active Information Gathering Problem (Active Information Acquisition) Given an initial sensor pose x 0 X, a prior distribution over the target state y 0, and a finite planning horizon T, the task is to choose a sequence of control actions: u 0,..., u T 1 U T which minimize conditional entropy h between the target state: y 1:T and the measurement set z 1:T : min u U T J (n) T T (u) := h(y t z 1:t, x 1:t ) (4) t=1 s.t. x t+1 = f (x t, u t ), t = 0,..., T 1 y t+1 = Ay t + w t t = 0,..., T 1 z t = H(x t )y t + v t t = 0,..., T Note that many other information measures are possible, such as mutual information, minimum mean-squared error, and others. 1 1 Nikolay Atanasov et al. Information acquisition with sensing robots: Algorithms and error bounds. In: 2014 IEEE International Conference on Robotics and Automation (ICRA). 2014. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 5 / 25 (5)

Reduction to Open-Loop Control Prior y 0 is Gaussian N (y 0, Σ 0 ), = Kalman filter is optimal. Objective proportional to log det, i.e. h(y t z 1:t, x 1:t ) log det(σ t ) Σ t independent of realizations z 1:t = open loop control optimal! 2 Problem (Active Information Acquisition) min µ U T J (n) T T (µ) := log det(σ t ) t=1 s.t. x t+1 = f (x t, u t ), t = 0,..., T 1 Σ t+1 = ρ p t+1 (ρe t (Σ t ), x t+1 ), t = 0,..., T 1 Update: ρ e t (Σ, x) := Σ K t (Σ, x)h t (x)σ K t (Σ, x) := ΣH t (x) T (H t (x)σh t (x) T + V t (x)) 1 Predict: ρ p t (Σ) := A t ΣA T t + W t 2 Jerome Le Ny and George J Pappas. On trajectory optimization for active sensing in Gaussian process models. In: 2009 IEEE Conference on Decision and Control (CDC). 2009. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 6 / 25

Optimal Solution Deterministic optimal control problem can be solved with Forward Value Iteration, which builds a search tree from (x 0, Σ 0 ) x, 0 0 u 1 u 2 u 3 Depth due to length of planning horizon Full search has complexity O( U T ). Can we do better? Branching due to number of motion primitives Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 7 / 25

What if trajectories cross? Want: Conditions on Σ t to eliminate nodes if trajectories cross at level t of the tree: x 0 1 2 1 x 1 2 1 x1 2 3 x 1 3 a (1 a) 1 3 3 1 x 2 2 x 3 2 x 2 4 x 2 5 x 2 There is a convex combination of Σ 1, and Σ 3 that dominates Σ 2, i.e. Σ 2 aσ 1 + (1 a)σ 3. It can be safely removed from the tree. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 8 / 25

ɛ - Algebraic Redundancy Want: Conditions on Σ t to eliminate nodes if trajectories cross: 3 Definition (ɛ-algebraic Redundancy ) Let ɛ 0 and let {Σ i } K i=1 S + n be a finite set. A matrix Σ S+ n is ɛ-algebraically redundant with respect to {Σ i } if there exist nonnegative constants {α i } K i=1 such that: K K α i = 1 and Σ + ɛi n α i Σ i i=1 i=1 a (1 a) 1 3 Example: Suppose aσ 1 + (1 a)σ 3 is a convex combination of Σ 1, and Σ 2. Σ 1 and Σ 3 cannot strictly eliminate Σ 2, because it is not contained in aσ 1 + (1 a)σ 3. However if we relax the inequality by ɛi, we can remove Σ 2. 3 Michael P Vitus et al. On efficient sensor scheduling for linear dynamical systems. In: Automatica 48.10 (2012), pp. 2482 2493. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 9 / 25

(ɛ, δ)- Reductions Can we remove even more trajectories? What if the trajectories do not exactly cross at time t? Definition (Trajectory δ-crossing) Trajectories π 1, π 2 X T δ-cross at time t [1, T ] if d X (π 1 t, π 2 t ) δ for δ 0. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 10 / 25

(ɛ, δ) Reduced Value Iteration Idea: Prune trajectories close in space and similar in informativeness Expanding Nodes in Search Tree = (ɛ,δ)-pruning = Algorithm 1 (ɛ, δ) Reduced Value Iteration 1: J 0 0, S 0 {(x 0, Σ 0, J 0)}. S t for t = 1,...T 2: for t = 1 : T do 3: for all (x, Σ, J) S t 1 do 4: for all u U do 5: x t f (x, u), Σ t ρ e t+1(ρ p t (Σ, x, u), x t)) 6: J t J + log det(σ t) 7: S t S t {(x t, Σ t, J t)} 8: end for 9: end for 10: Sort S t in ascending order according to log det( ) 11: S t S t[1] 12: for all (x, Σ, J) S t \ S t[1] do 13: % Find all nodes in S t, which δ-cross x: 14: Q {(Σ ) (x, Σ, J )} S t, d X (x, x ) δ 15: if isempty(q) or not(σ is ɛ-alg-red wrt Q) then 16: S t S t (x, Σ, J) 17: end if 18: end for 19: S t S t 20: end for 21: return min {J (x, Σ) S T } Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 11 / 25

Performance Guarantee Search tree with (ɛ, δ)-pruning is (ɛ, δ)-suboptimal with cost J ɛ,δ T that the bound holds 4 such 0 J ɛ,δ T J T (ζ T 1) [ JT log det(w )] + ɛ T (6) t 1 τ where ζ T := (1 + L s f L mδ) 1 (7) and T, L s f and L m are constants resulting from continuity assumptions. τ=1 Note the tradeoff in performance: (ɛ,δ) (0, 0) = J ɛ,δ T J T. If (ɛ,δ) (, ), we get a greedy policy with no guarantee. This bound is monotone in ɛ, δ. s=1 4 Nikolay Atanasov et al. Information acquisition with sensing robots: Algorithms and error bounds. In: 2014 IEEE International Conference on Robotics and Automation (ICRA). 2014. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 12 / 25

Motivation for Anytime Planning How do ɛ,δ affect execution time? What if problem dimension changes? (e.g. more targets) What if team changes? (e.g. more robots) All of these can be addressed by modifying the search algorithm to iteratively construct the tree, adding progressively more trajectories until execution time runs out, always returning a feasible solution. This is called an anytime algorithm. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 13 / 25

Anytime Reduced Value Iteration Idea: First build a greedy search tree, then iteratively ImprovePath until time expires To efficiently re-use computations, we mark all computed nodes S t as open and closed, i.e. O t, C t at all levels t in the tree 5 Algorithm 2 Anytime Reduced Value Iteration (x 0, Σ 0, T ARVI ) 1: J 0 0, S 0 {(x 0, Σ 0, J 0)}. S t for t = 1,...T 2: C 0 3: O t S t for t = 0,...T 4: S {S 0,..S T }, O {O 0,...O T }, C {C 0,..C T } 5: ɛ, δ 6: {S, O, C} ImprovePath(S, O, C, ɛ, δ) 7: Publish best solution from O[T ] 8: while Time Elapsed T ARVI do 9: Decrease (ɛ, δ) 10: {S, O, C} ImprovePath(S, O, C, ɛ, δ) 11: Publish best solution J from O[T ] 12: end while 5 Brent Schlotfeldt et al. Anytime Planning for Decentralized Multirobot Active Information Gathering. In: IEEE Robotics and Automation Letters 3.2 (2018), pp. 1025 1032. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 14 / 25

Tree Interpretation Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 15 / 25

Improve Path Expanding Open, but not Closed nodes = (ɛ,δ)-pruning by marking nodes Open = Algorithm 3 ImprovePath (S, O, C, ɛ, δ) 1: for t = 1 : T do 2: for all (x, Σ, J) O t 1 \ C t 1 do 3: C t 1 C t 1 {x, Σ} 4: for all u U do 5: x t f (x, u), Σ t ρ xt (Σ) 6: J t J + log det(σ t ) 7: S t S t {(x t, Σ, J t ))} 8: end for 9: end for 10: Sort S t in ascending order according to log det( ) 11: O t O t S t [1] 12: for all (x, Σ, J) S t \ O t do 13: % Find all nodes in O t, which δ-cross x: 14: Q {Σ (x, Σ, J )} O t, d X (x, x ) δ 15: O t O t (x, Σ, J) 16: end for 17: O[t] = O t 18: end for 19: return {S, O, C} Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 16 / 25

Main Result The following holds regarding the performance of ARVI: 6 Theorem (ARVI) The following are satisfied by the ARVI algorithm: (i) When ImprovePath(ɛ, δ) returns with finite (ɛ, δ), the returned solution is guaranteed to be (ɛ, δ)-sub-optimal. (ii) The cost J (n) T time. of the returned solution decreases monotonically over (iii) Given infinite time, C = O S, and O[T ] will contain the optimal solution to the planning problem. 6 Brent Schlotfeldt et al. Anytime Planning for Decentralized Multirobot Active Information Gathering. In: IEEE Robotics and Automation Letters 3.2 (2018), pp. 1025 1032. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 17 / 25

What about multiple robots? Algorithm still scales exponentially in the number of robots. Consider a coordinate descent approach 7. Each robot solves their own planning problem with fixed trajectories from prior robots. This continues: u1,0:(t c 1) argmin J (1) T (ˆµ) (8) ˆµ U1 T u2,0:(t c 1) argmin J (2) T (uc 1,0:(T 1), ˆµ) (9) ˆµ U2 T. 7 Nikolay Atanasov et al. Decentralized active information acquisition: Theory and application to multi-robot SLAM.. In: 2015 IEEE International Conference on Robotics and Automation (ICRA) (2015), pp. 4775 4782. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 18 / 25

Performance Coordinate Descent reduces the planning complexity from exponential to linear in number of robots. Algorithm achieves no less than 50% of the optimal performance, in the case of submodular objective functions 8. Results can also be extended to (approximately) sub-modular objective functions. 8 Nikolay Atanasov et al. Decentralized active information acquisition: Theory and application to multi-robot SLAM.. In: 2015 IEEE International Conference on Robotics and Automation (ICRA) (2015), pp. 4775 4782. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 19 / 25

What about Estimation? The robots can use a distributed information filter 9, which uses the inverse covariance Ω i,t = Σ 1 i,t and ω i,t = Σ 1 i,t y i,t: Update Step: ω i,t+1 = Predict Step: Ω i,t+1 = j N i {i} j N i {i} ŷ i (t) := Ω 1 i,t ω i,t κ i,j ω j,t + H T i κ i,j Ω j,t + H T i Ω i,t+1 = (AΩ 1 i,t+1 AT + W ) 1 ω i,t+1 = Ω i,t+1 Aŷ i,t+1 V 1 i z i (t) V 1 i H i (10) 9 Nikolay Atanasov et al. Joint estimation and localization in sensor networks. In: Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on. IEEE. 2014, pp. 6875 6882. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 20 / 25

Application: Target Tracking We use a discretized unicycle model for f (x, u): x t+1 1 x xt+1 2 t 1 ν sinc( ωτ = xt 2 2 + ) cos(θ t + ωτ ν sinc( ωτ 2 ) sin(θ t + ωτ θ t+1 θ t τω 2 ) 2 ) (11) Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 21 / 25

Application: Target Tracking We use a discretized unicycle model for f (x, u): x t+1 1 x xt+1 2 t 1 ν sinc( ωτ = xt 2 2 + ) cos(θ t + ωτ ν sinc( ωτ 2 ) sin(θ t + ωτ θ t+1 θ t τω 2 ) 2 ) (11) The target model is a double integrator: [ ] I2 τi y t+1 = A 2 y 0 I t + w t, w t N 2 ( [ ]) τ 0, q 3 /3I 2 τ 2 /2I 2 τ 2 /2I 2 τi 2 (12) Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 21 / 25

Application: Target Tracking We use a discretized unicycle model for f (x, u): x t+1 1 x xt+1 2 t 1 ν sinc( ωτ = xt 2 2 + ) cos(θ t + ωτ ν sinc( ωτ 2 ) sin(θ t + ωτ θ t+1 θ t τω 2 ) 2 ) (11) The target model is a double integrator: [ ] I2 τi y t+1 = A 2 y 0 I t + w t, w t N 2 The robots have range and bearing sensors. ( [ ]) τ 0, q 3 /3I 2 τ 2 /2I 2 τ 2 /2I 2 τi 2 z t,m = h(x t, y t,m ) + v t, v t N ( 0, V (x t, y t,m ) ) [ ] [ ] r(x, y) (y 1 x h(x, y) = := 1 ) 2 + (y 2 x 2 ) 2 α(x, y) tan 1 ((y 2 x 2 )(y 1 x 1 )) θ (12) (13) Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 21 / 25

Linearization, and MPC Due to the non-linear observation model, we linearize about a predicted target trajectory y and apply MPC: y h(x, y) = 1 [ ] (y 1 x 1 ) (y 2 x 2 ) 0 1x2 (14) r(x, y) sin(θ + α(x, y)) cos(θ + α(x, y)) 0 1x2 Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 22 / 25

Linearization, and MPC Due to the non-linear observation model, we linearize about a predicted target trajectory y and apply MPC: y h(x, y) = 1 [ ] (y 1 x 1 ) (y 2 x 2 ) 0 1x2 (14) r(x, y) sin(θ + α(x, y)) cos(θ + α(x, y)) 0 1x2 Algorithm 5 Anytime Multi-Robot Target Tracking 1: Input: T max, x 0, ˆω 0, ˆΩ 0, f, U, H, V, A, W, T, T ARVI 2: while t = 1 : T max do 3: Send ω i,t and Ω i,t to neighboring robots. 4: Receive measurements z i,t and perform distributed update step with any neighbor ω i,t, Ω i,t received. 5: Predict a target trajectory of length T : ŷ t,...ŷ T, and linearize observation model: H t y h(, ŷ t ) 6: Plan T -step trajectories with ARVI (Alg. 2) and coordinate descent. 7: Apply first control input to move each robot. 8: end while Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 22 / 25

Simulation Environment Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 23 / 25

Some Results 2 Average Target Entropy 10 Number of Targets 1.2 Average RMSE over time 0 m Entropy per target (nats) 1 0-1 0 m 1 m 5 m 10 m 15 m m Average targets detected 8 6 4 2 0 m 1 m 5 m 10 m 15 m m RMSE per target (m) 1 0.8 0.6 0.4 0.2 1 m 5 m 10 m 15 m m -2 0 200 400 600 800 1000 Timestep (a) 0 0 200 400 600 800 1000 Timestep (b) 0 0 200 400 600 800 1000 Timestep (c) 4 Entropy per Target 25 Number of Targets Tracked 2 Position RMSE per target Entropy per target (nats) 3 2 1 0-1 -2 T ARVI = 0 (greedy) T ARVI = 1.5s T ARVI = 3s Number of targets 20 15 10 5 T ARVI = 0 (greedy) T ARVI = 1.5s T ARVI = 3s RMSE (m) 1.5 1 0.5 T ARVI = 0 (greedy) T ARVI = 1.5s T ARVI = 3s -3 0 100 200 300 400 500 600 Timestep (d) 0 0 100 200 300 400 500 600 Timestep (e) 0 0 100 200 300 400 500 600 Timestep (f) Figure: Simulation results with 9 robots tracking 16 moving targets. The top row sweeps over communication radius, and the bottom row sweeps planning time. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 24 / 25

Conclusion Future Work: Resilient Active Information Acquisition (IROS 2018) Unknown Target Models Quadrotor Motion Model, Camera observation Model, Continuous Actions, etc. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 25 / 25