Course on Automated Planning: Transformations

Similar documents
Width and Complexity of Belief Tracking in Non-Deterministic Conformant and Contingent Planning

Compiling Uncertainty Away in Conformant Planning Problems with Bounded Width

Causal Belief Decomposition for Planning with Sensing: Completeness Results and Practical Approximation

Goal Recognition over POMDPs: Inferring the Intention of a POMDP Agent

Planning Graphs and Knowledge Compilation

Belief Tracking for Planning with Sensing

Causal Belief Decomposition for Planning with Sensing: Completeness Results and Practical Approximation

Planning Under Uncertainty II

An Algorithm better than AO*?

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

CSEP 573: Artificial Intelligence

Markov localization uses an explicit, discrete representation for the probability of all position in the state space.

Chapter 7 Propositional Satisfiability Techniques

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Heuristics for Cost-Optimal Classical Planning Based on Linear Programming

On the Effectiveness of CNF and DNF Representations in Contingent Planning

Completeness of Online Planners for Partially Observable Deterministic Tasks

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Heuristics for Planning with SAT and Expressive Action Definitions

CSE 473: Artificial Intelligence

Chapter 7 Propositional Satisfiability Techniques

Introduction to Mobile Robotics Probabilistic Robotics

Generalized Planning: Non-Deterministic Abstractions and Trajectory Constraints

Online Planing for Goal-Oriented Partial Observable MDPs via Compilation

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Replanning in Domains with Partial Information and Sensing Actions

Beliefs in Multiagent Planning: From One Agent to Many

Markov Decision Processes Chapter 17. Mausam

Essential facts about NP-completeness:

A Generic Framework and Solver for Synthesizing Finite-State Controllers

Using Classical Planners to Solve Conformant Probabilistic Planning Problems

Chapter 16 Planning Based on Markov Decision Processes

Markov Decision Processes Chapter 17. Mausam

CS 188: Artificial Intelligence Spring Announcements

Planning as Satisfiability

Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs

Distance Estimates for Planning in the Discrete Belief Space

Computer Science CPSC 322. Lecture 23 Planning Under Uncertainty and Decision Networks

Reasoning with Uncertainty

Width and Serialization of Classical Planning Problems

7. Propositional Logic. Wolfram Burgard and Bernhard Nebel

CS 7180: Behavioral Modeling and Decisionmaking

CS 188: Artificial Intelligence Spring Announcements

Learning Features and Abstract Actions for Computing Generalized Plans

Basic Game Theory. Kate Larson. January 7, University of Waterloo. Kate Larson. What is Game Theory? Normal Form Games. Computing Equilibria

Learning in Depth-First Search: A Unified Approach to Heuristic Search in Deterministic, Non-Deterministic, Probabilistic, and Game Tree Settings

CS 188: Artificial Intelligence. Our Status in CS188

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Probability, Markov models and HMMs. Vibhav Gogate The University of Texas at Dallas

Markov Chains and Hidden Markov Models

Foundations of Artificial Intelligence

CS 4649/7649 Robot Intelligence: Planning

6.045: Automata, Computability, and Complexity (GITCS) Class 15 Nancy Lynch

Foundations of Artificial Intelligence

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Approximate Inference

CS 188: Artificial Intelligence Spring 2009

Chapter 3 Deterministic planning

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

Final Exam December 12, 2017

Data Structures for Efficient Inference and Optimization

Partially Observable Markov Decision Processes (POMDPs)

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty

Basing Decisions on Sentences in Decision Diagrams

2D Image Processing. Bayes filter implementation: Kalman filter

Hidden Markov Models,99,100! Markov, here I come!

Introduction to Artificial Intelligence (AI)

An Algorithm better than AO*?

Information Gathering and Reward Exploitation of Subgoals for P

CS 5522: Artificial Intelligence II

CS 188: Artificial Intelligence Fall Recap: Inference Example

Conjunctive Normal Form and SAT

Introduction to Spring 2009 Artificial Intelligence Midterm Exam

Don t Plan for the Unexpected: Planning Based on Plausibility Models

CS325 Artificial Intelligence Ch. 15,20 Hidden Markov Models and Particle Filtering

CS 5522: Artificial Intelligence II

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

Artificial Intelligence

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Introduction to Artificial Intelligence (AI)

Faster Probabilistic Planning Through More Efficient Stochastic Satisfiability Problem Encodings

3 : Representation of Undirected GM

Optimally Solving Dec-POMDPs as Continuous-State MDPs

Decision Theory Intro: Preferences and Utility

Final Exam December 12, 2017

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Partially Observable Online Contingent Planning Using Landmark Heuristics

CS 343: Artificial Intelligence

Foundations of Artificial Intelligence

Dynamic models 1 Kalman filters, linearization,

I N F S Y S R E S E A R C H R E P O R T REASONING ABOUT ACTIONS WITH SENSING UNDER QUALITATIVE AND PROBABILISTIC UNCERTAINTY

Planning with PDDL3.0 Preferences by Compilation into STRIPS with Action Costs

Markov decision processes

Markov chains. Randomness and Computation. Markov chains. Markov processes

Final Exam, Fall 2002

Nonlinear Optimization for Optimal Control Part 2. Pieter Abbeel UC Berkeley EECS. From linear to nonlinear Model-predictive control (MPC) POMDPs

Transcription:

Course on Automated Planning: Transformations Hector Geffner ICREA & Universitat Pompeu Fabra Barcelona, Spain H. Geffner, Course on Automated Planning, Rome, 7/2010 1

AI Planning: Status The good news: classical planning works! Large problems solved very fast (non-optimally) Model simple but useful Operators not primitive; can be policies themselves Fast closed-loop replanning able to cope with uncertainty sometimes Not so good; limitations: Does not model Uncertainty (no probabilities) Does not deal with Incomplete Information (no sensing) Does not accommodate Preferences (simple cost structure)... H. Geffner, Course on Automated Planning, Rome, 7/2010 2

Beyond Classical Planning: Two Strategies Top-down: Develop solver for more general class of models; e.g., Markov Decision Processes (MDPs), Partial Observable MDPs (POMDPs),... +: generality : complexity Bottom-up: Extend the scope of current classical solvers +: efficiency : generality We ll do both, starting with transformations for compiling soft goals away (planning with preferences) compiling uncertainty away (conformant planning) compiling sensing away (planning with sensing) doing plan recognition (as opposed to plan generation) H. Geffner, Course on Automated Planning, Rome, 7/2010 3

Compilation of Soft Goals Planning with soft goals aimed at plans π that maximize utility u(π) = u(p) c(a) p do(π,s 0 ) a π Actions have cost c(a), and soft goals utility u(p) Best plans achieve best tradeoff between action costs and utilities Model used in recent planning competitions; net-benefit track 2008 IPC Yet it turns that soft goals do not add expressive power, and can be compiled away H. Geffner, Course on Automated Planning, Rome, 7/2010 4

Compilation of Soft Goals (cont d) For each soft goal p, create new hard goal p actions: initially false, and two new collect(p) with precondition p, effect p and cost 0, and forgo(p) with an empty precondition, effect p and cost u(p) Plans π maximize u(π) iff minimize c(π) = a π c(a) in resulting problem Compilation yields better results that native soft goal planners in recent IPC (Keyder & G. 07,09) IPC6 Net-Benefit Track Compiled Problems Domain Gamer HSP * P Mips-XXL Gamer HSP * F HSP * 0 Mips-XXL crewplanning(30) 4 16 8-8 21 8 elevators (30) 11 5 4 18 8 8 3 openstacks (30) 7 5 2 6 4 6 1 pegsol (30) 24 0 23 22 26 14 22 transport (30) 12 12 9-15 15 9 woodworking (30) 13 11 9-23 22 7 total 71 49 55 84 86 50 H. Geffner, Course on Automated Planning, Rome, 7/2010 5

Incomplete Information: Conformant Planning I G Problem: A robot must move from an uncertain I into G with certainty, one cell at a time, in a grid nxn Problem very much like a classical planning problem except for uncertain I Plans, however, quite different: best conformant plan must move the robot to a corner first (localization) H. Geffner, Course on Automated Planning, Rome, 7/2010 6

Conformant Planning: Belief State Formulation I G call a set of possible states, a belief state actions then map a belief state b into a bel state b a = {s s F (a, s) & s b} conformant problem becomes a path-finding problem in belief space Problem: number of belief state is doubly exponential in number of variables. effective representation of belief states b effective heuristic h(b) for estimating cost in belief space Recent alternative: translate into classical planning... H. Geffner, Course on Automated Planning, Rome, 7/2010 7

Basic Translation: Move to the Knowledge Level Given conformant problem P = F, O, I, G F stands for the fluents in P O for the operators with effects C L I for the initial situation (clauses over F -literals) G for the goal situation (set of F -literals) Define classical problem K 0 (P ) = F, O, I, G as F = {KL, K L L F } I = {KL clause L I} G = {KL L G} O = O but preconds L replaced by KL, and effects C L replaced by KC KL (supports) and K C K L (cancellation) K 0 (P ) is sound but incomplete: every classical plan that solves K 0 (P ) is a conformant plan for P, but not vice versa. H. Geffner, Course on Automated Planning, Rome, 7/2010 8

Key elements in Complete Translation K T,M (P ) A set T of tags t: consistent sets of assumptions (literals) about the initial situation I I = t A set M of merges m: valid subsets of tags (= DNF) I = t m t New (tagged) literals KL/t meaning that L is true if t true initially H. Geffner, Course on Automated Planning, Rome, 7/2010 9

A More General Translation K T,M (P ) Given conformant problem P = F, O, I, G F stands for the fluents in P O for the operators with effects C L I for the initial situation (clauses over F -literals) G for the goal situation (set of F -literals) define classical problem K T,M (P ) = F, O, I, G as F = {KL/t, K L/t L F and t T } I = {KL/t if I = t L} G = {KL L G} O = O but preconds L replaced by KL, and effects C L replaced by KC/t KL/t (supports) and K C/t K L/t (cancellation), and new merge actions ^ t m,m M KL/t KL The two parameters T and M are the set of tags (assumptions) and the set of merges (valid sets of assumptions)... H. Geffner, Course on Automated Planning, Rome, 7/2010 10

Compiling Uncertainty Away: Properties General translation scheme K T,M (P ) is always sound, and for suitable choice of the sets of tags and merges, it is complete. K S0 (P ) is complete instance of K T,M (P ) obtained by setting T to the set of possible initial states of P K i (P ) is a polynomial instance of K T,M (P ) that is complete for problems with width bounded by i. Merges for each L in K i (P ) chosen to satisfy i clauses in I relevant to L The width of most benchmarks bounded and equal 1! This means that such problems can be solved with a classical planner after a polynomial translation (Palacios & G. 07, 09) H. Geffner, Course on Automated Planning, Rome, 7/2010 11

Planning with Sensing: Models and Solutions Problem: Starting in one of two rightmost cells, get to B; A & B observable A B Contingent Planning A contingent plan is a tree of possible executions, all leading to the goal A contingent plan for the problem: R(ight), R, R, if B then R POMDP planning A POMDP policy is mapping of belief states to actions, leading to goal A POMDP policy for problem: If Bel B, then R (2 5 1 Bel s) I ll focus on different solution form: finite state controllers H. Geffner, Course on Automated Planning, Rome, 7/2010 12

Finite State Controllers: Example 1 Starting in A, move to B and back to A; marks A and B observable. A B This finite-state controller solves the problem -/Right A/Right -/Left q0 B/Left q1 FSC is compact and general: can add noise, vary distance, etc. Heavily used in practice, e.g. video-games and robotics, but written by hand The Challenge: How to get these controllers automatically H. Geffner, Course on Automated Planning, Rome, 7/2010 13

Finite State Controllers: Example 2 Problem P : find green block using visual-marker (circle) that can move around one cell at a time (à la Chapman and Ballard) Observables: Whether cell marked contains a green block (G), non-green block (B), or neither (C); and whether on table (T) or not ( ) TC/Right -B/Up TB/Up -B/Down q0 -C/Down TB/Right q1 Controller on the right solves the problem, and not only that, it s compact and general: it applies to any number of blocks and any configuration! Controller obtained by running a classical planner over transformed problem (Bonet, Palacios, G. 2009) H. Geffner, Course on Automated Planning, Rome, 7/2010 14

Some notation: Problem and Finite State Controllers Target problem P is like a classical problem with incomplete initial situation I and some observable fluents Finite State Controller C is a set of tuples t = q, o, a, q tuple t = q, o, a, q, depicted q o/a q, tells to do action a when o is observed in controller state q and then to switch to q Finite State Controller C solves P if all state trajectories compatible with P and C reach the goal Question: how to derive FSC for solving P? H. Geffner, Course on Automated Planning, Rome, 7/2010 15

Idea: Finite State Controllers as Conformant Plans Consider set of possible tuples t = q, o, a, q Let P be a problem that is like P but with 1. no observable fluents 2. new fluents o and q representing possible joint observations o and q s 3. actions b(t) replacing the actions a in P, where for t = q, o, a, q, b(t) is like a but conditional on both q and o being true, and resulting in q. Theorem: The finite state controller C solves P iff C is the set of tuples t in the actions b(t) of a stationary conformant plan for P Corollary: The finite state controller for P can be obtained with classical planner from further transformation of P. Plan π is stationary when for b(t) and b(t ) in π for t = q, o, a, q and t = q, o, a, q, then a = a and q = q H. Geffner, Course on Automated Planning, Rome, 7/2010 16

Intuition: Memoryless Controllers For simplicity, consider memoryless controllers where tuples are t = o, a, meaning to do a when o observed In transformed problem P the actions a in P replaced by a(o) where a(o) is like a when o is true, else is a NO-OP Claim: If the memoryless controller C = { o i, a i i = 1, n} solves P in m steps, the sequence a 1 [o 1 ],..., a n [o n ] repeated m times is a conformant plan for P H. Geffner, Course on Automated Planning, Rome, 7/2010 17

Example: FSC for Visual Marker Problem Problem P : find green block using visual-marker (circle) that can move around one cell at a time (à la Chapman or Ballard) Observables: Whether cell marked contains a green block (G), non-green block (B), or neither (C); and whether on table (T) or not ( ) TC/Right -B/Up TB/Up -B/Down q0 -C/Down TB/Right q1 Controller obtained using a classical planner from translation that assumes 2 controller states. Controller is compact and general: it applies to any number of blocks and any configuration H. Geffner, Course on Automated Planning, Rome, 7/2010 18

Plan Recognition A B C J S D H F E Agent can move one unit in the four directions Possible targets are A, B, C,... Starting in S, he is observed to move up twice Where is he going? H. Geffner, Course on Automated Planning, Rome, 7/2010 19

Standard Plan Recognition over Libraries (Abstract View) A plan recognition problem defined by triplet T = G, Π, O where G is the set of possible goals G, Π(G) is the set of possible plans π for G, G G, O is an observation sequence a 1,..., a n where each a i is an action A possible goal G G is plausible if plan π in Π(G) that satisfies O An action sequence π satisfies O if O is a subsequence of π H. Geffner, Course on Automated Planning, Rome, 7/2010 20

(Classical) Plan Recognition over Action Theories PR over action theories similar but with set of plans Π(G) defined implicitly: A plan recognition problem is a triplet T = P, G, O where P = F, A, I is planning domain: fluents F, actions A, init I, no goal G is a set of possible goals G, G F O is the observation sequence a 1,..., a n, all a i in A If Π(G) stands for good plans for G in P (to be defined), then as before: A possible goal G G is plausible if there is a plan π in Π(G) that satisfies O An action sequence π satisfies O if O is a subsequence of π Our goal: define the good plans and solve the problem with a classical planner H. Geffner, Course on Automated Planning, Rome, 7/2010 21

Compiling Observations Away We get rid of obs. O by transforming P = F, I, A into P = F, I, A so that π is a plan for G in P that satisfies O iff π is a plan for G + O in P and π is a plan for G in P that doesn t satisfy O iff π is a plan for G + O in P The transformation from P into P is actually very simple... H. Geffner, Course on Automated Planning, Rome, 7/2010 22

Compiling Observations Away (cont d) Given P = F, I, A, the transformed problem is P = F, I, A : F = F { p a a O}, I = I A = A where p a is new fluent for the observed action a in A with extra effect: p a, if a is the first observation in O, and p b p a, if b is the action that immediately precedes a in O. The goals O and O in P are p a and p a for the last action a in O The plans π for G in P that satisfy/don t satisfy O are the plans in P for G + O/G + O respectively H. Geffner, Course on Automated Planning, Rome, 7/2010 23

Planning Recognition as Planning: First Formulation Define the set Π(G) of good plans for G in P, as the optimal plans for G in P. Then G G is a plausible goal given observations O iff there is an optimal plan π for G in P that satisfies O; iff there is an optimal plan π for G in P that is a plan for G + O in P ; iff cost of G in P equal to cost of G + O in P abbreviated c P (G + O) = c P (G) It follows that plausibility of G can be computed exactly by calling an optimal planner twice: one for computing c P (G + O), one for computing c P (G). In turn, this can be approximated by calling suboptimal planner just once (Ramirez & G. 2009). We pursue a more general approach here... H. Geffner, Course on Automated Planning, Rome, 7/2010 24

Plan Recognition as Planning: A More General Formulation Don t filter goals G as plausible/implausible, Rather rank them with a probability distribution P (G O), G G From Bayes Rule P (G O) = α P (O G) P (G), where α is a normalizing constant P (G) assumed to be given in problem specification P (O G) defined in terms of extra cost to pay for not complying with the observations O: P (O G) = function(c(g + O) c(g + O)) H. Geffner, Course on Automated Planning, Rome, 7/2010 25

Example: Navigation in a Grid Revisited A B C J S D H F E If (G, O) def = c(g + O) c(g + O): For G = B, c(b + O) = c(b) = 4 ; c(b + O) = 6; thus (B, O) = 2 For G = C or A, c(c + O) = c(c + O) = c(c) = 8; thus (C, O) = 0 For all others G, c(g + O) = 8 ; c(g + O) = c(g) = 4; thus (G, O) = 4 If P (O G) is a monotonic function of (G, O), then P (O B) > P (O C) = P (O A) > P (G), for G {A, B, C} H. Geffner, Course on Automated Planning, Rome, 7/2010 26

Defining the Likelihoods P (O G) Assuming Boltzmann distribution and writing exp{x} for e x, likelihoods become P (O G) P (O G) def = α exp{ β c(g + O)} def = α exp{ β c(g + O)} where α is a normalizing constant, and β is a positive constant. Taking ratio of two equations, it follows that P (O G)/P (O G) = exp{β (G, O)} and hence P (O G) = 1/(1 + exp{ β (G, O)}) = sigmoid(β (G, O)) H. Geffner, Course on Automated Planning, Rome, 7/2010 27

Defining Likelihoods P (O G) (cont d) P (O G) = sigmoid(β (G, O)) (G, O) = c(g + O) c(g + O) E.g., P (O G) < P (O G) if c(g + O) < c(g + O) P (O G) = 1 if c(g + O) < c(g + O) = H. Geffner, Course on Automated Planning, Rome, 7/2010 28

Probabilistic Plan Recognition as Planning: Summary A plan recognition problem is a tuple T = P, G, O, P rob where P is a planning domain P = F, I, A G is a set of possible goals G, G F O is the observation sequence a 1,..., a n, a i O P rob is prior distribution over G Posterior distribution P (G O) obtained from Bayes Rule P (G O) = α P (O G) P rob(g) and Likelihood P (O G) = sigmoid{β [c(g + O) c(g + O)]} Distribution P (G O) computed exactly or approximately: exactly using optimal planner for determining c(g + O) and c(g + O), approximately using suboptimal planner for c(g + O) and c(g + O) In either case, 2 G planner calls are needed. H. Geffner, Course on Automated Planning, Rome, 7/2010 29

P(G Ot) Example: Noisy Walk 1 2 3 4 5 6 7 8 9 10 11 1 1 B C D E 2 3 0.75 4 5 6 A 6 11 F 0.5 G=A G=B G=C G=D G=E G=F 7 8 0.25 9 3 10 11 I 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Time Steps Graph on the left shows noisy walk and possible targets; curves on the right show posterior P (G O) of each possible target G as a function of time H. Geffner, Course on Automated Planning, Rome, 7/2010 30

Summary: Transformations Classical Planning solved as path-finding in state state Most used techniques are heuristic search and SAT Beyond classical planning: two approaches Top-down: solvers for richer models like MDPs and POMDPs Bottom-up: compile non-classical features away We have follow second approach with transformations to eliminate soft goals when planning with preferences uncertainty in conformant planning) sensing for deriving finite-state controllers observations for plan recognition Other transformations used for LTL plan constraints, control knowledge, etc. H. Geffner, Course on Automated Planning, Rome, 7/2010 31