10 Robotic Exploration and Information Gathering
|
|
- Paula Cain
- 5 years ago
- Views:
Transcription
1 NAVARCH/EECS 568, ROB Winter Robotic Exploration and Information Gathering Maani Ghaffari April 2, 2018
2 Robotic Information Gathering: Exploration and Monitoring In information gathering tasks, such as robotic exploration, reducing uncertainty is the direct goal of action selection (Thrun 2005). 2
3 Fully Unknown Environment How to task a robot to autonomously map and explore a fully unknown environment? 3
4 Sequential Decision Making Underlying dynamics are Markovian (Recall: see slides 03 Estimation). Full observability; sequential decision making problem is an instance of Markov Decision Processes (MDPs) (studied as early as 1950s) 1. MDP framework; the sensor model is deterministic and bijective, however, the uncertainty in action is allowed. In robotics measurements and actions are both stochastic; Partially Observable Markov Decision Processes (POMDPs) 2. 1 Bellman, R. (1957). A Markovian Decision Process. Journal of Mathematics and Mechanics, Vol. 6, No Kaelbling, L.P., Littman, M.L. and Cassandra, A.R., Planning and acting in partially observable stochastic domains. Artificial intelligence, 101, pp
5 Sequential Decision Making Two extreme cases: Planning horizon is 1; known as the greedy approach. Planning horizon is : often treated as a discounted problem in which as the horizon goes to infinity the payoff diminishes. 5
6 Frontier-based Exploration Frontier-based exploration using OGMs (Yamauchi 1997) Y X 6
7 Nearest Frontier The cost function, f c : A t R 0, is the length of the path (e.g. using A*, RRT*, etc.) from the current robot pose to the corresponding frontier. Optimal action: a t = argmin a t A t f c (a t ) 7
8 Information Gain How rewarding is an action? 8
9 Select the target with the highest information gain. Where to Move Next? Courtesy: C. Stachniss 9
10 Information Theory Entropy is a measure of the uncertainty of a random variable: H(X) = E p(x) [log 1 p(x) ] = X p(x) log p(x) The conditional entropy: H(Y X) = p(x, y) log p(y x) x X y Y 10
11 Information Theory The Mutual Information (MI) is the reduction in the uncertainty of one random variable due to the knowledge of the other. or I(X; Y ) = D KL (p(x, y) p(x)p(y)) = E p(x,y) [log I(X; Y ) = H(X) H(X Y ) p(x, y) p(x)p(y) ] 11
12 Mutual Information-based Exploration Direct calculation of the information gain using numerical integration techniques: I(M; Z t+1 z 1:t ) = H(M z 1:t ) }{{} H(M Z t+1, z 1:t ) }{{} Map Entropy Map Conditional Entropy M: map. Z t+1 : future observations. z 1:t : observations up to time t. 12
13 Maximum Information Gain The information gain-based utility function is f I : A t R 0. Optimal action: a t = argmax a t A t f I (a t ) 13
14 Cost-Utility Trade-off Let g : R 2 0 R 0 be a function that takes f c (a t ) and f I (a t ) as its input arguments. The total utility function: u(a t ) g(f I (a t ), f c (a t )) Optimal action: a t = argmax a t A t u(a t ) 14
15 Greedy Mutual Information-based Exploration Key idea: The MI-based utility function is computed at the centroids of geometric frontiers and the frontier with the highest utility is chosen as the next-best macro-action 3. Definition (Macro-action) A macro-action is an exploration target (frontier) which is assumed to be reachable through an open-loop control strategy. 3 He, R., Brunskill, E. and Roy, N., 2010, July. PUMA: Planning Under Uncertainty with Macro-Actions. In AAAI. 15
16 Robotic Information Gathering Mission Robotic Information Gathering Mission - Perception System Q1: When to terminate the mission? Output: Belief distribution over state variables Belief - Planning and Decision-making Data Repalnning Q2: When to stop planning? Q3: How to accpet full perception uncertainty? - Acting + Sensing Action 16
17 Planning Problem How to get from A to B? S B A 17
18 RRT Randomized kinodynamic planning (LaValle and Kuffner 2001) RRT extend operation: x new x init x near x 18
19 RRT* (Cost), RIG (Cost+Information gain) Sampling-based algorithms for optimal motion planning (Karaman and Frazzoli 2011) Rapidly-exploring Information Gathering (Hollinger and Sukhatme 2014)
20 Decision-Theoretic Planning What if there is no B (artificial targets) and the robot poses are uncertain? S A 20
21 Incremental Informative Motion Planning Infinite-horizon Planning: Pt =arg max P t A t s.t. f c (P t ) b }{{} t Budget constraint f I (P t ) t > t s and S = s 0:ts }{{} State estimate 21
22 Incremental Informative Motion Planning f I (P t ) t > t s Pt =arg max P t A t s.t. f c (P t ) b }{{} t Budget constraint and S = s 0:ts }{{} State estimate A t : the set of all possible trajectories (action space) at time t. S = s 0:ts : the state estimate up to time t s. f I (P t ): information function. f c (P t ): cost function. b t : budget. 22
23 Common Approximations and Simplifications (Not Desirable) Discretizing the state and/or action space. Making the state and/or action space to be finite. Greedy planning or planning for a limited/short horizon. Assuming the state or part of the state is fully observable (ignoring the uncertainty). Maximum likelihood observations assumption (optimistic). 23
24 Incrementally-exploring Information Gathering The IIG framework: Cost + Information gain + Convergence 1 The belief representation can be dense. 2 Information-theoretic convergence. 3 It takes into account all future measurements. 4 An information-theoretic interpretation of the planning horizon. 24
25 Information Functions Algorithms Mutual Information (MI) - Direct Gaussian Processes Variance Reduction (GPVR) - Non-parametric Algorithmic implementations are available in the paper: 25
26 Non-parametric Information Functions The predictive variance of a Gaussian process does not depend on the actual realization of the observations: V[f ] =k(x, x ) k(x, x ) T [K(X, X) + σ 2 ni n n ] 1 k(x, x ) The mutual information between the state X and observations Z can be approximated as: Î(X; Z) = n n log(σ Xi ) log(σ Xi Z) i=1 i=1 σ Xi and σ Xi Z are prior and posterior marginal variances. 26
27 Non-parametric Information Functions: UGPVR How to incorporate the robot pose uncertainty? Take the expectation of the kernel with respect to probability distribution function p(x): k = E[k] = kp(x)dx X 27
28 Convergence of the Planner The relative information contribution of node n new : RIC I new I near 1 n sample : the number of samples it takes to finds n new. The penalized relative information contribution: I RIC RIC n sample 28
29 Incrementally-exploring Information Gathering I RIC is non-dimensional. δ RIC sets the planning horizon (T ) from the information gathering point of view. Through using smaller values of δ RIC the planner can reach further points in both spatial and belief space 4 : If δ RIC 0, then T 4 See Algorithm 2 IIG-tree in 29
30 Convergence of the Planner 30
31 Information-theoretic Robotic Exploration When to stop the information gathering mission? 31
32 Information-theoretic Robotic Exploration Definition (Map saturation probability) The probability that the robot is completely confident about the occupancy status of a point is defined as p sat. Definition (Map saturation entropy) The entropy of a point from a map whose occupancy probability is p sat, is defined as h sat H(p sat ). 32
33 Information Theory (Cover and Thomas 1991) Theorem (Conditioning reduces entropy (Information cannot hurt)) H(X Y ) H(X) Theorem (Chain rule for entropy) H(X 1, X 2,..., X n ) = n H(X i X i 1,..., X 1 ) i=1 33
34 Information Theory (Cover and Thomas 1991) Theorem (Independence bound on entropy) n H(X 1, X 2,..., X n ) H(X i ) i=1 Expand the LHS using the chain rule for entropy. 34
35 Information-theoretic Robotic Exploration For a completely explored occupancy map, the least upper bound of the average map entropy is given by: sup 1 n H(M) = H(p sat) Theorem (The least upper bound of the average map entropy) Let n N be the number of map points. In the limit, for a completely explored occupancy map, the least upper bound of the average map entropy is given by h sat = H(p sat ). 35
36 Information-theoretic Robotic Exploration Proof. From Theorem of independence bound on entropy and through multiplying each side of the inequality by 1, we can write the average map entropy as n 1 n H(M) < 1 n n H(M = m [i] ) i=1 by taking the limit as p(m) p sat, then 1 lim p(m) p sat n H(M) < lim 1 p(m) p sat n 1 lim H(M) < H(psat) p(m) p sat n sup 1 H(M) = H(psat) n n H(M = m [i] ) i=1 36
37 Information-theoretic Robotic Exploration Remark The result also extends to continuous random variables and differential entropy. Remark Note that we do not assume any distribution for map points. The entropy can be calculated either with the assumption that the map points are normally distributed or treating them as Bernoulli random variables. Remark Since 0 < p sat < 1 and H(p sat ) = H(1 p sat ), one saturation entropy can be set for the entire map. 37
38 Map Exploration Termination Autonomous robotic exploration for mapping can be terminated when: 1 n H(M = m [i] ) H(p sat ) n i=1 Setting a threshold in the information space is the natural consideration of uncertainty in the robot perception. Corollary (Map exploration termination) The problem of autonomous robotic exploration for mapping can be terminated when 1 n n i=1 H(M = m[i] ) H(p sat ). 38
39 Information Gathering Termination Given a saturation entropy h sat : 1 n n H(X i ) h sat i=1 Regardless of the quantity of interest, we can provide a stopping criterion for the exploration mission. Corollary (information gathering termination) Given a saturation entropy h sat, the problem of search for information gathering for desired random variables X 1, X 2,..., X n whose support is alphabet X, can be terminated when 1 n n i=1 H(X i) h sat. 39
40 Robotic Exploration in Unknown Environment Comparison of Active Pose SLAM (APS) (Valencia et al. 2012) and IIG; Cave dataset. 40
41 Lake Monitoring Experiment Survey of the lake area using an Autonomous Surface Vehicle (Hollinger and Sukhatme 2014). The wireless signal strength map of the lake area regressed using a Gaussian Process. The map is used as a proxy for groundtruth. (a) Survey (b) Mean surface (c) Variance surface 41
42 Lake Monitoring Experiment The ASV can localize using a GPS unit and a Doppler Velocity Log. The communication with the ground station is through a wireless connection and at any location the wireless signal strength (WSS) can be measured in dbm. The dataset includes about 2700 observations and is collected through a full survey of the lake area located at Puddingstone Lake in San Dimas, CA. At any location the robot can take measurements within a sensing range from the grundtruth maps. 42
43 IIG-tree using UGPVR Use the collected measurements along the most informative path extracted from the IIG graph and rebuild the GP WSS mean surface. RMSE: 4.26 ± 0.05 dbm; Time: ± 0.67 sec; The numbers are averaged over 100 runs (mean ± standard error). 43
44 Some Related Readings Probabilistic Robotics Ch Yamauchi, B., A Frontier-based Approach for Autonomous Exploration. In IEEE International Symposium on Computational Intelligence in Robotics and Automation, pp Stachniss, C., Grisetti, G. and Burgard, W., Information Gain-based Exploration using RAO-Blackwellized particle filters. In Robotics: Science and Systems. Atanasov, N. A., Active Information Acquisition with Mobile Robots. PhD Thesis, University of Pennsylvania. Hollinger, G.A. and Sukhatme, G.S., Sampling-based Robotic Information Gathering Algorithms. The International Journal of Robotics Research, 33(9), pp Ghaffari Jadidi, M., Valls Miro, J. and Dissanayake, G., Sampling-based Incremental Information Gathering with Applications to Robotic Exploration and Environmental Monitoring. arxiv preprint arxiv: Code (click). 44
Partially Observable Markov Decision Processes (POMDPs)
Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and
More informationIntroduction to Mobile Robotics Information Gain-Based Exploration. Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Giorgio Grisetti, Kai Arras
Introduction to Mobile Robotics Information Gain-Based Exploration Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Giorgio Grisetti, Kai Arras 1 Tasks of Mobile Robots mapping SLAM localization integrated
More informationInformation Gain-based Exploration Using Rao-Blackwellized Particle Filters
Information Gain-based Exploration Using Rao-Blackwellized Particle Filters Cyrill Stachniss Giorgio Grisetti Wolfram Burgard University of Freiburg, Department of Computer Science, D-79110 Freiburg, Germany
More informationL11. EKF SLAM: PART I. NA568 Mobile Robotics: Methods & Algorithms
L11. EKF SLAM: PART I NA568 Mobile Robotics: Methods & Algorithms Today s Topic EKF Feature-Based SLAM State Representation Process / Observation Models Landmark Initialization Robot-Landmark Correlation
More informationRobust Monte Carlo Methods for Sequential Planning and Decision Making
Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationActive Pose SLAM with RRT*
Active Pose SLAM with RRT* Joan Vallvé and Juan Andrade-Cetto Abstract We propose a novel method for robotic exploration that evaluates paths that minimize both the joint path and map entropy per meter
More informationAutonomous Mobile Robot Design
Autonomous Mobile Robot Design Topic: Particle Filter for Localization Dr. Kostas Alexis (CSE) These slides relied on the lectures from C. Stachniss, and the book Probabilistic Robotics from Thurn et al.
More informationAnytime Planning for Decentralized Multi-Robot Active Information Gathering
Anytime Planning for Decentralized Multi-Robot Active Information Gathering Brent Schlotfeldt 1 Dinesh Thakur 1 Nikolay Atanasov 2 Vijay Kumar 1 George Pappas 1 1 GRASP Laboratory University of Pennsylvania
More informationReinforcement learning
Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error
More informationCAP Plan, Activity, and Intent Recognition
CAP6938-02 Plan, Activity, and Intent Recognition Lecture 10: Sequential Decision-Making Under Uncertainty (part 1) MDPs and POMDPs Instructor: Dr. Gita Sukthankar Email: gitars@eecs.ucf.edu SP2-1 Reminder
More informationA Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty
2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationSimultaneous Localization and Mapping
Simultaneous Localization and Mapping Miroslav Kulich Intelligent and Mobile Robotics Group Czech Institute of Informatics, Robotics and Cybernetics Czech Technical University in Prague Winter semester
More informationRobotics. Mobile Robotics. Marc Toussaint U Stuttgart
Robotics Mobile Robotics State estimation, Bayes filter, odometry, particle filter, Kalman filter, SLAM, joint Bayes filter, EKF SLAM, particle SLAM, graph-based SLAM Marc Toussaint U Stuttgart DARPA Grand
More informationDialogue as a Decision Making Process
Dialogue as a Decision Making Process Nicholas Roy Challenges of Autonomy in the Real World Wide range of sensors Noisy sensors World dynamics Adaptability Incomplete information Robustness under uncertainty
More informationRL 14: POMDPs continued
RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally
More informationPartially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS
Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal
More informationParticle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics
Particle Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Motivation For continuous spaces: often no analytical formulas for Bayes filter updates
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationThe geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm
More informationLecture Note 2: Estimation and Information Theory
Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 2: Estimation and Information Theory Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 2.1 Estimation A static estimation problem
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationReinforcement Learning
Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University
More informationPlanning Under Uncertainty II
Planning Under Uncertainty II Intelligent Robotics 2014/15 Bruno Lacerda Announcement No class next Monday - 17/11/2014 2 Previous Lecture Approach to cope with uncertainty on outcome of actions Markov
More informationOn the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers
On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers Huizhen (Janey) Yu (janey@mit.edu) Dimitri Bertsekas (dimitrib@mit.edu) Lab for Information and Decision Systems,
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationThe Game of Twenty Questions with noisy answers. Applications to Fast face detection, micro-surgical tool tracking and electron microscopy
The Game of Twenty Questions with noisy answers. Applications to Fast face detection, micro-surgical tool tracking and electron microscopy Graduate Summer School: Computer Vision July 22 - August 9, 2013
More information6 Reinforcement Learning
6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,
More informationPartially Observable Markov Decision Processes (POMDPs)
Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions
More informationProbabilistic numerics for deep learning
Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling
More informationSampling-based Nonholonomic Motion Planning in Belief Space via Dynamic Feedback Linearization-based FIRM
1 Sampling-based Nonholonomic Motion Planning in Belief Space via Dynamic Feedback Linearization-based FIRM Ali-akbar Agha-mohammadi, Suman Chakravorty, Nancy M. Amato Abstract In roadmap-based methods,
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationAn Adaptive Clustering Method for Model-free Reinforcement Learning
An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationCS 7180: Behavioral Modeling and Decisionmaking
CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and
More informationRL 14: Simplifications of POMDPs
RL 14: Simplifications of POMDPs Michael Herrmann University of Edinburgh, School of Informatics 04/03/2016 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally
More informationIntroduction to Artificial Intelligence (AI)
Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 10 Oct, 13, 2011 CPSC 502, Lecture 10 Slide 1 Today Oct 13 Inference in HMMs More on Robot Localization CPSC 502, Lecture
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationTowards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs. Lorenzo Nardi and Cyrill Stachniss
Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs Lorenzo Nardi and Cyrill Stachniss Navigation under uncertainty C B C B A A 2 `B` is the most likely position C B C B A A 3
More informationIntroduction to Mobile Robotics Probabilistic Robotics
Introduction to Mobile Robotics Probabilistic Robotics Wolfram Burgard 1 Probabilistic Robotics Key idea: Explicit representation of uncertainty (using the calculus of probability theory) Perception Action
More informationAn online kernel-based clustering approach for value function approximation
An online kernel-based clustering approach for value function approximation N. Tziortziotis and K. Blekas Department of Computer Science, University of Ioannina P.O.Box 1186, Ioannina 45110 - Greece {ntziorzi,kblekas}@cs.uoi.gr
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationDecayed Markov Chain Monte Carlo for Interactive POMDPs
Decayed Markov Chain Monte Carlo for Interactive POMDPs Yanlin Han Piotr Gmytrasiewicz Department of Computer Science University of Illinois at Chicago Chicago, IL 60607 {yhan37,piotr}@uic.edu Abstract
More informationOutline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012
CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline
More informationPlanning Under Uncertainty in the Continuous Domain: a Generalized Belief Space Approach
Planning Under Uncertainty in the Continuous Domain: a Generalized Belief Space Approach Vadim Indelman, Luca Carlone, and Frank Dellaert Abstract This work investigates the problem of planning under uncertainty,
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationPlanning and Acting in Partially Observable Stochastic Domains
Planning and Acting in Partially Observable Stochastic Domains Leslie Pack Kaelbling*, Michael L. Littman**, Anthony R. Cassandra*** *Computer Science Department, Brown University, Providence, RI, USA
More informationParticle Filters; Simultaneous Localization and Mapping (Intelligent Autonomous Robotics) Subramanian Ramamoorthy School of Informatics
Particle Filters; Simultaneous Localization and Mapping (Intelligent Autonomous Robotics) Subramanian Ramamoorthy School of Informatics Recap: State Estimation using Kalman Filter Project state and error
More informationLecture notes for Analysis of Algorithms : Markov decision processes
Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with
More informationReinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it
More informationControl Theory : Course Summary
Control Theory : Course Summary Author: Joshua Volkmann Abstract There are a wide range of problems which involve making decisions over time in the face of uncertainty. Control theory draws from the fields
More information2534 Lecture 4: Sequential Decisions and Markov Decision Processes
2534 Lecture 4: Sequential Decisions and Markov Decision Processes Briefly: preference elicitation (last week s readings) Utility Elicitation as a Classification Problem. Chajewska, U., L. Getoor, J. Norman,Y.
More informationA Gentle Introduction to Reinforcement Learning
A Gentle Introduction to Reinforcement Learning Alexander Jung 2018 1 Introduction and Motivation Consider the cleaning robot Rumba which has to clean the office room B329. In order to keep things simple,
More informationMDP Preliminaries. Nan Jiang. February 10, 2019
MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process
More informationIntroduction to Mobile Robotics SLAM: Simultaneous Localization and Mapping
Introduction to Mobile Robotics SLAM: Simultaneous Localization and Mapping Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz What is SLAM? Estimate the pose of a robot and the map of the environment
More informationIntroduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.
L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission
More information3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.
Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that
More informationSLAM Techniques and Algorithms. Jack Collier. Canada. Recherche et développement pour la défense Canada. Defence Research and Development Canada
SLAM Techniques and Algorithms Jack Collier Defence Research and Development Canada Recherche et développement pour la défense Canada Canada Goals What will we learn Gain an appreciation for what SLAM
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationMarkov Decision Processes
Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour
More informationAn Adaptive Multi-resolution State Lattice Approach for Motion Planning with Uncertainty
An Adaptive Multi-resolution State Lattice Approach for Motion Planning with Uncertainty A. González-Sieira 1, M. Mucientes 1 and A. Bugarín 1 Centro de Investigación en Tecnoloxías da Información (CiTIUS),
More informationCS181 Midterm 2 Practice Solutions
CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,
More information2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,
2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationTowards Multi-Robot Active Collaborative State Estimation via Belief Space Planning
1 Towards Multi-Robot Active Collaborative State Estimation via Belief Space Planning Vadim Indelman IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2015 Introduction
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationPART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics
Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationMutual Information Based Data Selection in Gaussian Processes for People Tracking
Proceedings of Australasian Conference on Robotics and Automation, 3-5 Dec 01, Victoria University of Wellington, New Zealand. Mutual Information Based Data Selection in Gaussian Processes for People Tracking
More informationChapter 16 Planning Based on Markov Decision Processes
Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 1 Motivation c a b Until
More informationLecture Note 1: Probability Theory and Statistics
Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would
More informationComparison of Information Theory Based and Standard Methods for Exploration in Reinforcement Learning
Freie Universität Berlin Fachbereich Mathematik und Informatik Master Thesis Comparison of Information Theory Based and Standard Methods for Exploration in Reinforcement Learning Michael Borst Advisor:
More informationMotivation for introducing probabilities
for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.
More informationCapacity of the Discrete Memoryless Energy Harvesting Channel with Side Information
204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener
More informationFinal Exam December 12, 2017
Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes
More informationProbabilistic Graphical Models for Image Analysis - Lecture 1
Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.
More informationFinal Exam December 12, 2017
Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes
More informationMarks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:
Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More information1 MDP Value Iteration Algorithm
CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using
More informationGradient Sampling for Improved Action Selection and Path Synthesis
Gradient Sampling for Improved Action Selection and Path Synthesis Ian M. Mitchell Department of Computer Science The University of British Columbia November 2016 mitchell@cs.ubc.ca http://www.cs.ubc.ca/~mitchell
More informationProbabilistic Robotics
University of Rome La Sapienza Master in Artificial Intelligence and Robotics Probabilistic Robotics Prof. Giorgio Grisetti Course web site: http://www.dis.uniroma1.it/~grisetti/teaching/probabilistic_ro
More informationLecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning
More informationLECTURE 3. Last time:
LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationDistributed Optimization. Song Chong EE, KAIST
Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links
More informationEfficient Maximization in Solving POMDPs
Efficient Maximization in Solving POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Shlomo Zilberstein Computer Science Department University
More informationNotes from Week 9: Multi-Armed Bandit Problems II. 1 Information-theoretic lower bounds for multiarmed
CS 683 Learning, Games, and Electronic Markets Spring 007 Notes from Week 9: Multi-Armed Bandit Problems II Instructor: Robert Kleinberg 6-30 Mar 007 1 Information-theoretic lower bounds for multiarmed
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationUsing first-order logic, formalize the following knowledge:
Probabilistic Artificial Intelligence Final Exam Feb 2, 2016 Time limit: 120 minutes Number of pages: 19 Total points: 100 You can use the back of the pages if you run out of space. Collaboration on the
More informationBioinformatics: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)
More informationEfficient Information Planning in Graphical Models
Efficient Information Planning in Graphical Models computational complexity considerations John Fisher & Giorgos Papachristoudis, MIT VITALITE Annual Review 2013 September 9, 2013 J. Fisher (VITALITE Annual
More information