10 Robotic Exploration and Information Gathering

Size: px

Start display at page:

Download "10 Robotic Exploration and Information Gathering"

Paula Cain
5 years ago
Views:

1 NAVARCH/EECS 568, ROB Winter Robotic Exploration and Information Gathering Maani Ghaffari April 2, 2018

as robotic exploration, reducing uncertainty is

2 Robotic Information Gathering: Exploration and Monitoring In information gathering tasks, such as robotic exploration, reducing uncertainty is the direct goal of action selection (Thrun 2005). 2

3 Fully Unknown Environment How to task a robot to autonomously map and explore a fully unknown environment? 3

4 Sequential Decision Making Underlying dynamics are Markovian (Recall: see slides 03 Estimation). Full observability; sequential decision making problem is an instance of Markov Decision Processes (MDPs) (studied as early as 1950s) 1. MDP framework; the sensor model is deterministic and bijective, however, the uncertainty in action is allowed. In robotics measurements and actions are both stochastic; Partially Observable Markov Decision Processes (POMDPs) 2. 1 Bellman, R. (1957). A Markovian Decision Process. Journal of Mathematics and Mechanics, Vol. 6, No Kaelbling, L.P., Littman, M.L. and Cassandra, A.R., Planning and acting in partially observable stochastic domains. Artificial intelligence, 101, pp

5 Sequential Decision Making Two extreme cases: Planning horizon is 1; known as the greedy approach. Planning horizon is : often treated as a discounted problem in which as the horizon goes to infinity the payoff diminishes. 5

6 Frontier-based Exploration Frontier-based exploration using OGMs (Yamauchi 1997) Y X 6

7 Nearest Frontier The cost function, f c : A t R 0, is the length of the path (e.g. using A*, RRT*, etc.) from the current robot pose to the corresponding frontier. Optimal action: a t = argmin a t A t f c (a t ) 7

8 Information Gain How rewarding is an action? 8

9 Select the target with the highest information gain. Where to Move Next? Courtesy: C. Stachniss 9

10 Information Theory Entropy is a measure of the uncertainty of a random variable: H(X) = E p(x) [log 1 p(x) ] = X p(x) log p(x) The conditional entropy: H(Y X) = p(x, y) log p(y x) x X y Y 10

11 Information Theory The Mutual Information (MI) is the reduction in the uncertainty of one random variable due to the knowledge of the other. or I(X; Y ) = D KL (p(x, y) p(x)p(y)) = E p(x,y) [log I(X; Y ) = H(X) H(X Y ) p(x, y) p(x)p(y) ] 11

12 Mutual Information-based Exploration Direct calculation of the information gain using numerical integration techniques: I(M; Z t+1 z 1:t ) = H(M z 1:t ) }{{} H(M Z t+1, z 1:t ) }{{} Map Entropy Map Conditional Entropy M: map. Z t+1 : future observations. z 1:t : observations up to time t. 12

13 Maximum Information Gain The information gain-based utility function is f I : A t R 0. Optimal action: a t = argmax a t A t f I (a t ) 13

14 Cost-Utility Trade-off Let g : R 2 0 R 0 be a function that takes f c (a t ) and f I (a t ) as its input arguments. The total utility function: u(a t ) g(f I (a t ), f c (a t )) Optimal action: a t = argmax a t A t u(a t ) 14

15 Greedy Mutual Information-based Exploration Key idea: The MI-based utility function is computed at the centroids of geometric frontiers and the frontier with the highest utility is chosen as the next-best macro-action 3. Definition (Macro-action) A macro-action is an exploration target (frontier) which is assumed to be reachable through an open-loop control strategy. 3 He, R., Brunskill, E. and Roy, N., 2010, July. PUMA: Planning Under Uncertainty with Macro-Actions. In AAAI. 15

16 Robotic Information Gathering Mission Robotic Information Gathering Mission - Perception System Q1: When to terminate the mission? Output: Belief distribution over state variables Belief - Planning and Decision-making Data Repalnning Q2: When to stop planning? Q3: How to accpet full perception uncertainty? - Acting + Sensing Action 16

17 Planning Problem How to get from A to B? S B A 17

18 RRT Randomized kinodynamic planning (LaValle and Kuffner 2001) RRT extend operation: x new x init x near x 18

19 RRT* (Cost), RIG (Cost+Information gain) Sampling-based algorithms for optimal motion planning (Karaman and Frazzoli 2011) Rapidly-exploring Information Gathering (Hollinger and Sukhatme 2014)

20 Decision-Theoretic Planning What if there is no B (artificial targets) and the robot poses are uncertain? S A 20

21 Incremental Informative Motion Planning Infinite-horizon Planning: Pt =arg max P t A t s.t. f c (P t ) b }{{} t Budget constraint f I (P t ) t > t s and S = s 0:ts }{{} State estimate 21

22 Incremental Informative Motion Planning f I (P t ) t > t s Pt =arg max P t A t s.t. f c (P t ) b }{{} t Budget constraint and S = s 0:ts }{{} State estimate A t : the set of all possible trajectories (action space) at time t. S = s 0:ts : the state estimate up to time t s. f I (P t ): information function. f c (P t ): cost function. b t : budget. 22

23 Common Approximations and Simplifications (Not Desirable) Discretizing the state and/or action space. Making the state and/or action space to be finite. Greedy planning or planning for a limited/short horizon. Assuming the state or part of the state is fully observable (ignoring the uncertainty). Maximum likelihood observations assumption (optimistic). 23

24 Incrementally-exploring Information Gathering The IIG framework: Cost + Information gain + Convergence 1 The belief representation can be dense. 2 Information-theoretic convergence. 3 It takes into account all future measurements. 4 An information-theoretic interpretation of the planning horizon. 24

25 Information Functions Algorithms Mutual Information (MI) - Direct Gaussian Processes Variance Reduction (GPVR) - Non-parametric Algorithmic implementations are available in the paper: 25

26 Non-parametric Information Functions The predictive variance of a Gaussian process does not depend on the actual realization of the observations: V[f ] =k(x, x ) k(x, x ) T [K(X, X) + σ 2 ni n n ] 1 k(x, x ) The mutual information between the state X and observations Z can be approximated as: Î(X; Z) = n n log(σ Xi ) log(σ Xi Z) i=1 i=1 σ Xi and σ Xi Z are prior and posterior marginal variances. 26

27 Non-parametric Information Functions: UGPVR How to incorporate the robot pose uncertainty? Take the expectation of the kernel with respect to probability distribution function p(x): k = E[k] = kp(x)dx X 27

28 Convergence of the Planner The relative information contribution of node n new : RIC I new I near 1 n sample : the number of samples it takes to finds n new. The penalized relative information contribution: I RIC RIC n sample 28

29 Incrementally-exploring Information Gathering I RIC is non-dimensional. δ RIC sets the planning horizon (T ) from the information gathering point of view. Through using smaller values of δ RIC the planner can reach further points in both spatial and belief space 4 : If δ RIC 0, then T 4 See Algorithm 2 IIG-tree in 29

30 Convergence of the Planner 30

31 Information-theoretic Robotic Exploration When to stop the information gathering mission? 31

32 Information-theoretic Robotic Exploration Definition (Map saturation probability) The probability that the robot is completely confident about the occupancy status of a point is defined as p sat. Definition (Map saturation entropy) The entropy of a point from a map whose occupancy probability is p sat, is defined as h sat H(p sat ). 32

33 Information Theory (Cover and Thomas 1991) Theorem (Conditioning reduces entropy (Information cannot hurt)) H(X Y ) H(X) Theorem (Chain rule for entropy) H(X 1, X 2,..., X n ) = n H(X i X i 1,..., X 1 ) i=1 33

34 Information Theory (Cover and Thomas 1991) Theorem (Independence bound on entropy) n H(X 1, X 2,..., X n ) H(X i ) i=1 Expand the LHS using the chain rule for entropy. 34

35 Information-theoretic Robotic Exploration For a completely explored occupancy map, the least upper bound of the average map entropy is given by: sup 1 n H(M) = H(p sat) Theorem (The least upper bound of the average map entropy) Let n N be the number of map points. In the limit, for a completely explored occupancy map, the least upper bound of the average map entropy is given by h sat = H(p sat ). 35

36 Information-theoretic Robotic Exploration Proof. From Theorem of independence bound on entropy and through multiplying each side of the inequality by 1, we can write the average map entropy as n 1 n H(M) < 1 n n H(M = m [i] ) i=1 by taking the limit as p(m) p sat, then 1 lim p(m) p sat n H(M) < lim 1 p(m) p sat n 1 lim H(M) < H(psat) p(m) p sat n sup 1 H(M) = H(psat) n n H(M = m [i] ) i=1 36

37 Information-theoretic Robotic Exploration Remark The result also extends to continuous random variables and differential entropy. Remark Note that we do not assume any distribution for map points. The entropy can be calculated either with the assumption that the map points are normally distributed or treating them as Bernoulli random variables. Remark Since 0 < p sat < 1 and H(p sat ) = H(1 p sat ), one saturation entropy can be set for the entire map. 37

38 Map Exploration Termination Autonomous robotic exploration for mapping can be terminated when: 1 n H(M = m [i] ) H(p sat ) n i=1 Setting a threshold in the information space is the natural consideration of uncertainty in the robot perception. Corollary (Map exploration termination) The problem of autonomous robotic exploration for mapping can be terminated when 1 n n i=1 H(M = m[i] ) H(p sat ). 38

39 Information Gathering Termination Given a saturation entropy h sat : 1 n n H(X i ) h sat i=1 Regardless of the quantity of interest, we can provide a stopping criterion for the exploration mission. Corollary (information gathering termination) Given a saturation entropy h sat, the problem of search for information gathering for desired random variables X 1, X 2,..., X n whose support is alphabet X, can be terminated when 1 n n i=1 H(X i) h sat. 39

40 Robotic Exploration in Unknown Environment Comparison of Active Pose SLAM (APS) (Valencia et al. 2012) and IIG; Cave dataset. 40

The wireless signal strength map of the lake area regressed using a

41 Lake Monitoring Experiment Survey of the lake area using an Autonomous Surface Vehicle (Hollinger and Sukhatme 2014). The wireless signal strength map of the lake area regressed using a Gaussian Process. The map is used as a proxy for groundtruth. (a) Survey (b) Mean surface (c) Variance surface 41

42 Lake Monitoring Experiment The ASV can localize using a GPS unit and a Doppler Velocity Log. The communication with the ground station is through a wireless connection and at any location the wireless signal strength (WSS) can be measured in dbm. The dataset includes about 2700 observations and is collected through a full survey of the lake area located at Puddingstone Lake in San Dimas, CA. At any location the robot can take measurements within a sensing range from the grundtruth maps. 42

43 IIG-tree using UGPVR Use the collected measurements along the most informative path extracted from the IIG graph and rebuild the GP WSS mean surface. RMSE: 4.26 ± 0.05 dbm; Time: ± 0.67 sec; The numbers are averaged over 100 runs (mean ± standard error). 43

44 Some Related Readings Probabilistic Robotics Ch Yamauchi, B., A Frontier-based Approach for Autonomous Exploration. In IEEE International Symposium on Computational Intelligence in Robotics and Automation, pp Stachniss, C., Grisetti, G. and Burgard, W., Information Gain-based Exploration using RAO-Blackwellized particle filters. In Robotics: Science and Systems. Atanasov, N. A., Active Information Acquisition with Mobile Robots. PhD Thesis, University of Pennsylvania. Hollinger, G.A. and Sukhatme, G.S., Sampling-based Robotic Information Gathering Algorithms. The International Journal of Robotics Research, 33(9), pp Ghaffari Jadidi, M., Valls Miro, J. and Dissanayake, G., Sampling-based Incremental Information Gathering with Applications to Robotic Exploration and Environmental Monitoring. arxiv preprint arxiv: Code (click). 44

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and