Anytime Planning for Decentralized Multi-Robot Active Information Gathering

Anytime Planning for Decentralized Multi-Robot Active Information Gathering Brent Schlotfeldt 1 Dinesh Thakur 1 Nikolay Atanasov 2 Vijay Kumar 1 George Pappas 1 1 GRASP Laboratory University of Pennsylvania Philadelphia, PA, USA 2 Department of Electrical and Computer Engineering University of California San Diego San Diego, CA, USA May 17, 2018 Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 1 / 25

What is the problem? Suppose the robot at x 0 wants to plan a path to x T to track a target y T Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 2 / 25

Overview 1 Active Information Gathering Problem Statement Reduction to Open-Loop Control, and Optimal Solution (ɛ,δ)-reduced Value Iteration Anytime Reduced Value Iteration 2 Multiple Robots Coordinate Descent Distributed Estimation Application: Target Tracking Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 3 / 25

Problem Statement Consider a team of n mobile robots, with sensor motion models: x i,t+1 = f i (x i,t, u i,t ), i {1,..., n} (1) where x i,t X, u i,t U, and U is a finite set. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 4 / 25

Problem Statement Consider a team of n mobile robots, with sensor motion models: x i,t+1 = f i (x i,t, u i,t ), i {1,..., n} (1) where x i,t X, u i,t U, and U is a finite set. The robots goal is to track a target y t with target motion model: y t+1 = Ay t + w t, w t N (0, W ) t = 0,..., T (2) The robots are able to sense the target using on-board sensors obeying the following sensor observation model: z i,t = H(x i,t )y t + v t (x i,t ), v t N (0, V (x i,t )) (3) Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 4 / 25

Problem Statement Consider a team of n mobile robots, with sensor motion models: x i,t+1 = f i (x i,t, u i,t ), i {1,..., n} (1) where x i,t X, u i,t U, and U is a finite set. The robots goal is to track a target y t with target motion model: y t+1 = Ay t + w t, w t N (0, W ) t = 0,..., T (2) The robots are able to sense the target using on-board sensors obeying the following sensor observation model: z i,t = H(x i,t )y t + v t (x i,t ), v t N (0, V (x i,t )) (3) Note the Linear, Gaussian assumptions on the target and observations, but not on the robot. The set U of admissible controls is finite. To simplify notation, we refer to the state of the team as x t at time t. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 4 / 25

Active Information Gathering Problem (Active Information Acquisition) Given an initial sensor pose x 0 X, a prior distribution over the target state y 0, and a finite planning horizon T, the task is to choose a sequence of control actions: u 0,..., u T 1 U T which minimize conditional entropy h between the target state: y 1:T and the measurement set z 1:T : min u U T J (n) T T (u) := h(y t z 1:t, x 1:t ) (4) t=1 s.t. x t+1 = f (x t, u t ), t = 0,..., T 1 y t+1 = Ay t + w t t = 0,..., T 1 z t = H(x t )y t + v t t = 0,..., T Note that many other information measures are possible, such as mutual information, minimum mean-squared error, and others. 1 1 Nikolay Atanasov et al. Information acquisition with sensing robots: Algorithms and error bounds. In: 2014 IEEE International Conference on Robotics and Automation (ICRA). 2014. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 5 / 25 (5)

Reduction to Open-Loop Control Prior y 0 is Gaussian N (y 0, Σ 0 ), = Kalman filter is optimal. Objective proportional to log det, i.e. h(y t z 1:t, x 1:t ) log det(σ t ) Σ t independent of realizations z 1:t = open loop control optimal! 2 Problem (Active Information Acquisition) min µ U T J (n) T T (µ) := log det(σ t ) t=1 s.t. x t+1 = f (x t, u t ), t = 0,..., T 1 Σ t+1 = ρ p t+1 (ρe t (Σ t ), x t+1 ), t = 0,..., T 1 Update: ρ e t (Σ, x) := Σ K t (Σ, x)h t (x)σ K t (Σ, x) := ΣH t (x) T (H t (x)σh t (x) T + V t (x)) 1 Predict: ρ p t (Σ) := A t ΣA T t + W t 2 Jerome Le Ny and George J Pappas. On trajectory optimization for active sensing in Gaussian process models. In: 2009 IEEE Conference on Decision and Control (CDC). 2009. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 6 / 25

Optimal Solution Deterministic optimal control problem can be solved with Forward Value Iteration, which builds a search tree from (x 0, Σ 0 ) x, 0 0 u 1 u 2 u 3 Depth due to length of planning horizon Full search has complexity O( U T ). Can we do better? Branching due to number of motion primitives Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 7 / 25

What if trajectories cross? Want: Conditions on Σ t to eliminate nodes if trajectories cross at level t of the tree: x 0 1 2 1 x 1 2 1 x1 2 3 x 1 3 a (1 a) 1 3 3 1 x 2 2 x 3 2 x 2 4 x 2 5 x 2 There is a convex combination of Σ 1, and Σ 3 that dominates Σ 2, i.e. Σ 2 aσ 1 + (1 a)σ 3. It can be safely removed from the tree. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 8 / 25

ɛ - Algebraic Redundancy Want: Conditions on Σ t to eliminate nodes if trajectories cross: 3 Definition (ɛ-algebraic Redundancy ) Let ɛ 0 and let {Σ i } K i=1 S + n be a finite set. A matrix Σ S+ n is ɛ-algebraically redundant with respect to {Σ i } if there exist nonnegative constants {α i } K i=1 such that: K K α i = 1 and Σ + ɛi n α i Σ i i=1 i=1 a (1 a) 1 3 Example: Suppose aσ 1 + (1 a)σ 3 is a convex combination of Σ 1, and Σ 2. Σ 1 and Σ 3 cannot strictly eliminate Σ 2, because it is not contained in aσ 1 + (1 a)σ 3. However if we relax the inequality by ɛi, we can remove Σ 2. 3 Michael P Vitus et al. On efficient sensor scheduling for linear dynamical systems. In: Automatica 48.10 (2012), pp. 2482 2493. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 9 / 25

(ɛ, δ)- Reductions Can we remove even more trajectories? What if the trajectories do not exactly cross at time t? Definition (Trajectory δ-crossing) Trajectories π 1, π 2 X T δ-cross at time t [1, T ] if d X (π 1 t, π 2 t ) δ for δ 0. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 10 / 25

(ɛ, δ) Reduced Value Iteration Idea: Prune trajectories close in space and similar in informativeness Expanding Nodes in Search Tree = (ɛ,δ)-pruning = Algorithm 1 (ɛ, δ) Reduced Value Iteration 1: J 0 0, S 0 {(x 0, Σ 0, J 0)}. S t for t = 1,...T 2: for t = 1 : T do 3: for all (x, Σ, J) S t 1 do 4: for all u U do 5: x t f (x, u), Σ t ρ e t+1(ρ p t (Σ, x, u), x t)) 6: J t J + log det(σ t) 7: S t S t {(x t, Σ t, J t)} 8: end for 9: end for 10: Sort S t in ascending order according to log det( ) 11: S t S t[1] 12: for all (x, Σ, J) S t \ S t[1] do 13: % Find all nodes in S t, which δ-cross x: 14: Q {(Σ ) (x, Σ, J )} S t, d X (x, x ) δ 15: if isempty(q) or not(σ is ɛ-alg-red wrt Q) then 16: S t S t (x, Σ, J) 17: end if 18: end for 19: S t S t 20: end for 21: return min {J (x, Σ) S T } Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 11 / 25

Performance Guarantee Search tree with (ɛ, δ)-pruning is (ɛ, δ)-suboptimal with cost J ɛ,δ T that the bound holds 4 such 0 J ɛ,δ T J T (ζ T 1) [ JT log det(w )] + ɛ T (6) t 1 τ where ζ T := (1 + L s f L mδ) 1 (7) and T, L s f and L m are constants resulting from continuity assumptions. τ=1 Note the tradeoff in performance: (ɛ,δ) (0, 0) = J ɛ,δ T J T. If (ɛ,δ) (, ), we get a greedy policy with no guarantee. This bound is monotone in ɛ, δ. s=1 4 Nikolay Atanasov et al. Information acquisition with sensing robots: Algorithms and error bounds. In: 2014 IEEE International Conference on Robotics and Automation (ICRA). 2014. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 12 / 25

Motivation for Anytime Planning How do ɛ,δ affect execution time? What if problem dimension changes? (e.g. more targets) What if team changes? (e.g. more robots) All of these can be addressed by modifying the search algorithm to iteratively construct the tree, adding progressively more trajectories until execution time runs out, always returning a feasible solution. This is called an anytime algorithm. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 13 / 25

Anytime Reduced Value Iteration Idea: First build a greedy search tree, then iteratively ImprovePath until time expires To efficiently re-use computations, we mark all computed nodes S t as open and closed, i.e. O t, C t at all levels t in the tree 5 Algorithm 2 Anytime Reduced Value Iteration (x 0, Σ 0, T ARVI ) 1: J 0 0, S 0 {(x 0, Σ 0, J 0)}. S t for t = 1,...T 2: C 0 3: O t S t for t = 0,...T 4: S {S 0,..S T }, O {O 0,...O T }, C {C 0,..C T } 5: ɛ, δ 6: {S, O, C} ImprovePath(S, O, C, ɛ, δ) 7: Publish best solution from O[T ] 8: while Time Elapsed T ARVI do 9: Decrease (ɛ, δ) 10: {S, O, C} ImprovePath(S, O, C, ɛ, δ) 11: Publish best solution J from O[T ] 12: end while 5 Brent Schlotfeldt et al. Anytime Planning for Decentralized Multirobot Active Information Gathering. In: IEEE Robotics and Automation Letters 3.2 (2018), pp. 1025 1032. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 14 / 25

Tree Interpretation Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 15 / 25

Improve Path Expanding Open, but not Closed nodes = (ɛ,δ)-pruning by marking nodes Open = Algorithm 3 ImprovePath (S, O, C, ɛ, δ) 1: for t = 1 : T do 2: for all (x, Σ, J) O t 1 \ C t 1 do 3: C t 1 C t 1 {x, Σ} 4: for all u U do 5: x t f (x, u), Σ t ρ xt (Σ) 6: J t J + log det(σ t ) 7: S t S t {(x t, Σ, J t ))} 8: end for 9: end for 10: Sort S t in ascending order according to log det( ) 11: O t O t S t [1] 12: for all (x, Σ, J) S t \ O t do 13: % Find all nodes in O t, which δ-cross x: 14: Q {Σ (x, Σ, J )} O t, d X (x, x ) δ 15: O t O t (x, Σ, J) 16: end for 17: O[t] = O t 18: end for 19: return {S, O, C} Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 16 / 25

Main Result The following holds regarding the performance of ARVI: 6 Theorem (ARVI) The following are satisfied by the ARVI algorithm: (i) When ImprovePath(ɛ, δ) returns with finite (ɛ, δ), the returned solution is guaranteed to be (ɛ, δ)-sub-optimal. (ii) The cost J (n) T time. of the returned solution decreases monotonically over (iii) Given infinite time, C = O S, and O[T ] will contain the optimal solution to the planning problem. 6 Brent Schlotfeldt et al. Anytime Planning for Decentralized Multirobot Active Information Gathering. In: IEEE Robotics and Automation Letters 3.2 (2018), pp. 1025 1032. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 17 / 25

What about multiple robots? Algorithm still scales exponentially in the number of robots. Consider a coordinate descent approach 7. Each robot solves their own planning problem with fixed trajectories from prior robots. This continues: u1,0:(t c 1) argmin J (1) T (ˆµ) (8) ˆµ U1 T u2,0:(t c 1) argmin J (2) T (uc 1,0:(T 1), ˆµ) (9) ˆµ U2 T. 7 Nikolay Atanasov et al. Decentralized active information acquisition: Theory and application to multi-robot SLAM.. In: 2015 IEEE International Conference on Robotics and Automation (ICRA) (2015), pp. 4775 4782. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 18 / 25

Performance Coordinate Descent reduces the planning complexity from exponential to linear in number of robots. Algorithm achieves no less than 50% of the optimal performance, in the case of submodular objective functions 8. Results can also be extended to (approximately) sub-modular objective functions. 8 Nikolay Atanasov et al. Decentralized active information acquisition: Theory and application to multi-robot SLAM.. In: 2015 IEEE International Conference on Robotics and Automation (ICRA) (2015), pp. 4775 4782. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 19 / 25

What about Estimation? The robots can use a distributed information filter 9, which uses the inverse covariance Ω i,t = Σ 1 i,t and ω i,t = Σ 1 i,t y i,t: Update Step: ω i,t+1 = Predict Step: Ω i,t+1 = j N i {i} j N i {i} ŷ i (t) := Ω 1 i,t ω i,t κ i,j ω j,t + H T i κ i,j Ω j,t + H T i Ω i,t+1 = (AΩ 1 i,t+1 AT + W ) 1 ω i,t+1 = Ω i,t+1 Aŷ i,t+1 V 1 i z i (t) V 1 i H i (10) 9 Nikolay Atanasov et al. Joint estimation and localization in sensor networks. In: Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on. IEEE. 2014, pp. 6875 6882. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 20 / 25

Application: Target Tracking We use a discretized unicycle model for f (x, u): x t+1 1 x xt+1 2 t 1 ν sinc( ωτ = xt 2 2 + ) cos(θ t + ωτ ν sinc( ωτ 2 ) sin(θ t + ωτ θ t+1 θ t τω 2 ) 2 ) (11) Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 21 / 25

Application: Target Tracking We use a discretized unicycle model for f (x, u): x t+1 1 x xt+1 2 t 1 ν sinc( ωτ = xt 2 2 + ) cos(θ t + ωτ ν sinc( ωτ 2 ) sin(θ t + ωτ θ t+1 θ t τω 2 ) 2 ) (11) The target model is a double integrator: [ ] I2 τi y t+1 = A 2 y 0 I t + w t, w t N 2 ( [ ]) τ 0, q 3 /3I 2 τ 2 /2I 2 τ 2 /2I 2 τi 2 (12) Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 21 / 25

Application: Target Tracking We use a discretized unicycle model for f (x, u): x t+1 1 x xt+1 2 t 1 ν sinc( ωτ = xt 2 2 + ) cos(θ t + ωτ ν sinc( ωτ 2 ) sin(θ t + ωτ θ t+1 θ t τω 2 ) 2 ) (11) The target model is a double integrator: [ ] I2 τi y t+1 = A 2 y 0 I t + w t, w t N 2 The robots have range and bearing sensors. ( [ ]) τ 0, q 3 /3I 2 τ 2 /2I 2 τ 2 /2I 2 τi 2 z t,m = h(x t, y t,m ) + v t, v t N ( 0, V (x t, y t,m ) ) [ ] [ ] r(x, y) (y 1 x h(x, y) = := 1 ) 2 + (y 2 x 2 ) 2 α(x, y) tan 1 ((y 2 x 2 )(y 1 x 1 )) θ (12) (13) Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 21 / 25

Linearization, and MPC Due to the non-linear observation model, we linearize about a predicted target trajectory y and apply MPC: y h(x, y) = 1 [ ] (y 1 x 1 ) (y 2 x 2 ) 0 1x2 (14) r(x, y) sin(θ + α(x, y)) cos(θ + α(x, y)) 0 1x2 Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 22 / 25

Linearization, and MPC Due to the non-linear observation model, we linearize about a predicted target trajectory y and apply MPC: y h(x, y) = 1 [ ] (y 1 x 1 ) (y 2 x 2 ) 0 1x2 (14) r(x, y) sin(θ + α(x, y)) cos(θ + α(x, y)) 0 1x2 Algorithm 5 Anytime Multi-Robot Target Tracking 1: Input: T max, x 0, ˆω 0, ˆΩ 0, f, U, H, V, A, W, T, T ARVI 2: while t = 1 : T max do 3: Send ω i,t and Ω i,t to neighboring robots. 4: Receive measurements z i,t and perform distributed update step with any neighbor ω i,t, Ω i,t received. 5: Predict a target trajectory of length T : ŷ t,...ŷ T, and linearize observation model: H t y h(, ŷ t ) 6: Plan T -step trajectories with ARVI (Alg. 2) and coordinate descent. 7: Apply first control input to move each robot. 8: end while Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 22 / 25

Simulation Environment Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 23 / 25

Some Results 2 Average Target Entropy 10 Number of Targets 1.2 Average RMSE over time 0 m Entropy per target (nats) 1 0-1 0 m 1 m 5 m 10 m 15 m m Average targets detected 8 6 4 2 0 m 1 m 5 m 10 m 15 m m RMSE per target (m) 1 0.8 0.6 0.4 0.2 1 m 5 m 10 m 15 m m -2 0 200 400 600 800 1000 Timestep (a) 0 0 200 400 600 800 1000 Timestep (b) 0 0 200 400 600 800 1000 Timestep (c) 4 Entropy per Target 25 Number of Targets Tracked 2 Position RMSE per target Entropy per target (nats) 3 2 1 0-1 -2 T ARVI = 0 (greedy) T ARVI = 1.5s T ARVI = 3s Number of targets 20 15 10 5 T ARVI = 0 (greedy) T ARVI = 1.5s T ARVI = 3s RMSE (m) 1.5 1 0.5 T ARVI = 0 (greedy) T ARVI = 1.5s T ARVI = 3s -3 0 100 200 300 400 500 600 Timestep (d) 0 0 100 200 300 400 500 600 Timestep (e) 0 0 100 200 300 400 500 600 Timestep (f) Figure: Simulation results with 9 robots tracking 16 moving targets. The top row sweeps over communication radius, and the bottom row sweeps planning time. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 24 / 25

Conclusion Future Work: Resilient Active Information Acquisition (IROS 2018) Unknown Target Models Quadrotor Motion Model, Camera observation Model, Continuous Actions, etc. Brent Schlotfeldt 1, Dinesh Thakur 1, Nikolay Atanasov 2, Anytime Vijay Kumar Planning 1,, George Pappas 1 (Penn) May 17, 2018 25 / 25