An Algorithm better than AO*?

Similar documents
An Algorithm better than AO*?

Learning in Depth-First Search: A Unified Approach to Heuristic Search in Deterministic, Non-Deterministic, Probabilistic, and Game Tree Settings

Artificial Intelligence. Non-deterministic state model. Model for non-deterministic problems. Solutions. Blai Bonet

Heuristic Search Algorithms

Causal Belief Decomposition for Planning with Sensing: Completeness Results and Practical Approximation

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

Planning Graphs and Knowledge Compilation

Width and Complexity of Belief Tracking in Non-Deterministic Conformant and Contingent Planning

Completeness of Online Planners for Partially Observable Deterministic Tasks

Markov Decision Processes Chapter 17. Mausam

CS 7180: Behavioral Modeling and Decisionmaking

Course on Automated Planning: Transformations

Goal Recognition over POMDPs: Inferring the Intention of a POMDP Agent

Distributed Optimization. Song Chong EE, KAIST

CSE 573. Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming. Slides adapted from Andrey Kolobov and Mausam

Chapter 16 Planning Based on Markov Decision Processes

Strengthening Landmark Heuristics via Hitting Sets

CS 188 Fall Introduction to Artificial Intelligence Midterm 1

Mock Exam Künstliche Intelligenz-1. Different problems test different skills and knowledge, so do not get stuck on one problem.

CITS4211 Mid-semester test 2011

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Search and Lookahead. Bernhard Nebel, Julien Hué, and Stefan Wölfl. June 4/6, 2012

Introduction to Spring 2009 Artificial Intelligence Midterm Exam

Generalized Planning: Non-Deterministic Abstractions and Trajectory Constraints

CPSC 421: Tutorial #1

Chapter 3 Deterministic planning

1 Computational Problems

Using first-order logic, formalize the following knowledge:

Causal Belief Decomposition for Planning with Sensing: Completeness Results and Practical Approximation

Notes for Lecture Notes 2

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

Fast SSP Solvers Using Short-Sighted Labeling

Running Time. Assumption. All capacities are integers between 1 and C.

Introduction to Arti Intelligence

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Introduction to Fall 2011 Artificial Intelligence Final Exam

Introduction to Spring 2006 Artificial Intelligence Practice Final

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Introduction to Spring 2009 Artificial Intelligence Midterm Solutions

Value-Ordering and Discrepancies. Ciaran McCreesh and Patrick Prosser

Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs

Part V. Matchings. Matching. 19 Augmenting Paths for Matchings. 18 Bipartite Matching via Flows

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

Section Handout 5 Solutions

CS221 Practice Midterm

Introduction to Fall 2008 Artificial Intelligence Midterm Exam

, and rewards and transition matrices as shown below:

CSE 473: Artificial Intelligence Autumn Topics

CTL satisfability via tableau A C++ implementation

CS675: Convex and Combinatorial Optimization Fall 2014 Combinatorial Problems as Linear Programs. Instructor: Shaddin Dughmi

Markov Decision Processes

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes

CS 4100 // artificial intelligence. Recap/midterm review!

Chapter 6 Constraint Satisfaction Problems

Midterm. Introduction to Artificial Intelligence. CS 188 Summer You have approximately 2 hours and 50 minutes.

Decision Tree Learning

Markov Decision Processes Chapter 17. Mausam

Semi-Markov/Graph Cuts

Real Time Value Iteration and the State-Action Value Function

Searching in non-deterministic, partially observable and unknown environments

Searching in non-deterministic, partially observable and unknown environments

16.410: Principles of Autonomy and Decision Making Final Exam

IS-ZC444: ARTIFICIAL INTELLIGENCE

Automata-based Verification - III

Learning Features and Abstract Actions for Computing Generalized Plans

MAL TSEV CONSTRAINTS MADE SIMPLE

CMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro

PROBABILISTIC PLANNING WITH RISK-SENSITIVE CRITERION PING HOU. A dissertation submitted to the Graduate School

Artificial Intelligence

Name: UW CSE 473 Final Exam, Fall 2014

Shortest paths with negative lengths

Constraint Satisfaction Problems

Context Free Grammars: Introduction. Context Free Grammars: Simplifying CFGs

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

Slides credited from Hsueh-I Lu & Hsu-Chun Hsiao

Trees. A tree is a graph which is. (a) Connected and. (b) has no cycles (acyclic).

Partially Observable Markov Decision Processes (POMDPs)

Data Flow Analysis. Lecture 6 ECS 240. ECS 240 Data Flow Analysis 1

Automata-based Verification - III

Lecture 3: The Reinforcement Learning Problem

1. (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max P (s s, a)u(s )

Grundlagen der Künstlichen Intelligenz

CS360 Homework 12 Solution

Compiling Uncertainty Away in Conformant Planning Problems with Bounded Width

Rollout-based Game-tree Search Outprunes Traditional Alpha-beta

Foundations of Artificial Intelligence

Worst case analysis for a general class of on-line lot-sizing heuristics

Machine Learning: Exercise Sheet 2

Stochastic Shortest Path Problems

Principles of AI Planning

An Admissible Heuristic for SAS + Planning Obtained from the State Equation

PART A and ONE question from PART B; or ONE question from PART A and TWO questions from PART B.

Final Exam December 12, 2017

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Planning in Markov Decision Processes

Sampling from Bayes Nets

Worst-Case Solution Quality Analysis When Not Re-Expanding Nodes in Best-First Search

FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016)

Chapter 4: Dynamic Programming

Transcription:

An Algorithm better than? Blai Bonet Universidad Simón Boĺıvar Caracas, Venezuela Héctor Geffner ICREA and Universitat Pompeu Fabra Barcelona, Spain 7/2005 An Algorithm Better than? B. Bonet and H. Geffner; 7/05

Motivation Heuristic Search methods can be efficient but lack common foundation: IDA*,, Alpha-Beta,... Dynamic Programming methods such as are general but not as efficient Question: can we the get the best of both; i.e., generality and efficiency? Answer is yes, combining their key ideas: Admissible Heuristics (Lower Bounds) Learning (Value Updates as in, RTDP, etc) An Algorithm Better than? B. Bonet and H. Geffner; 7/05 2

What does proposed integration give us? An algorithm schema, called, that is simple, general, and efficient: simple because it can be expressed in a few lines of code; indeed = Depth First Search + Learning general because it handles many models: OR Graphs (IDA*), AND/OR Graphs (), Game Trees (Alpha-Beta), MDPs, etc. efficient because it reduces to state-of-the-art algorithms in many of these models, while in others, yields new competitive algorithms; e.g. = { IDA* + TT for OR-Graphs MTD( ) for Game Trees We also show that better than over Max AND/OR Graphs... An Algorithm Better than? B. Bonet and H. Geffner; 7/05 3

What does proposed integration give us? (cont'd) Like, RTDP, and L, combines lower bounds with learning, but motivation and goals are slightly different By accounting for and generalizing existing algorithms, we aim to uncover the three key computational ideas that underlie them all so that nothing else is left out. These ideas are: Depth First Search Lower Bounds Learning It is also useful to know that, say, new MDP algorithm, reduces to well-known and tested algorithms when applied OR-Graphs or Game Trees An Algorithm Better than? B. Bonet and H. Geffner; 7/05 4

Models. a discrete and finite states space S, 2. an initial state s 0 S, 3. a non-empty set of terminal states S T S, 4. actions A(s) A applicable in each non-terminal state, 5. a function that maps states and actions into sets of states F (a, s) S, 6. action costs c(a, s) for non-terminal states s, and 7. terminal costs c T (s) for terminal states. Deterministic: F (a, s) =, Non-Deterministic: F (a, s), MDPs: probabilities P a (s s) for s F (s, a) that add up to... An Algorithm Better than? B. Bonet and H. Geffner; 7/05 5

Solutions (Optimal) Solutions can all be expressed in terms of value function V satisfying Bellman equation: V (s) = { ct (s) if s is terminal min a A(s) Q V (a, s) otherwise where Q V (a, s) stands for the cost-to-go value defined as: c(a, s) + V (s ), s F (a, s) c(a, s) + max s F (a,s) V (s ) c(a, s) + s F (a,s) V (s ) c(a, s) + s F (a,s) P a(s s)v (s ) max s F (a,s) V (s ) for OR Graphs for Max AND/OR Graphs for Add AND/OR Graphs for MDPs for Game Trees A policy (solution) π maps states into actions, must be closed around s 0, and is optimal if π(s) = argmin a A(s) Q V (a, s) for V satisfying Bellman An Algorithm Better than? B. Bonet and H. Geffner; 7/05 6

(): A general solution method. Start with arbitrary cost function V 2. Repeat until residual over all s is 0 (i.e., LHS = RHS) Update V (s) := min a A(s) Q V (a, s) for all s 3. Return π V (s) = argmin a A(s) Q V (a, s) is simple and general (models encoded in form of Q V ), but also exhaustive (considers all states) and affected by dead-ends (V (s) = ) Both problems solvable using initial state s 0 and lower bound V... An Algorithm Better than? B. Bonet and H. Geffner; 7/05 7

Find-and-Revise: Selective Schema Assume V admissible (V V ) and monotonic (V (s) min a A(s) Q V (a, s)) Define s inconsistent if V (s) < min a A(s) Q V (a, s)). Start with a lower bound V 2. Repeat until no more states found in a. a. Find inconsistent s reachable from s 0 and π V b. Update V (s) to min a A(s) Q V (a, s) 3. Return π V (s) = argmin a A(s) Q V (a, s) Find-and-Revise yields optimal π in at most s V (s) V (s) iterations (provided integer costs and no probabilities) Proposed = Find-and-Revise with: Find = DFS that backtracks on inconsistent states that Updates states on backtracks, and Labels as Solved states s with no inconsistencies beneath An Algorithm Better than? B. Bonet and H. Geffner; 7/05 8

Learning in Depth-First Search () ldfs-driver(s 0 ) begin repeat solved := ldfs(s 0 ) until solved return (V, π) end ldfs(s) begin if s is solved or terminal then if s is terminal then V (s) := c T (s) Mark s as solved return true flag := false foreach a A(s) do if Q V (a, s) > V (s) then continue flag := true foreach s F (a, s) do flag := ldfs(s ) & [Q V (a, s) V (s)] if flag then break if flag then break if flag then π(s) := a Mark s as solved else V (s) := min a A(s) Q V (a, s) end return flag An Algorithm Better than? B. Bonet and H. Geffner; 7/05 9

Properties of and Bounded ldfs computes π for all models if V admissible (i.e. V V ) For OR-Graphs and monotone V, ldfs = ida* + transposition tables For Game Trees and V =, bounded ldfs = mtd( ) For Additive models, For Max models, ldfs = bounded ldfs ldfs bounded ldfs (like,, min-max, etc) computes optimal solutions graphs where each node is an optimal solution subgraph; over Max Models, this isn t needed. Bounded fixed this, enforcing consistency only where needed An Algorithm Better than? B. Bonet and H. Geffner; 7/05 0

Empirical Evaluation: Algorithms, Heuristics, Domains Algorithms: vi, ao*/cfc rev, min-max lrta*, ldfs, bounded ldfs Heuristics: h = 0 and two domain-independent heuristics h and h 2 Domains Coins: Find counterfeit coin among N coins; N = 0, 20,..., 60. Diagnosis: Find true state of system among M states with N binary tests: In one case, N = 0 and M in {0, 20,..., 60}, in second, M = 60 and N in {0, 2,..., 28}. Rules: Derivation of atoms in acyclic rule systems with N atoms, and at most R rules per atom and M atoms per rule body... R = M = 50 and N in {5000, 0000,..., 20000}. MTS: Predator must catch a prey that moves non-deterministically to a non-blocked adjacent cell in a given random maze of size N N; N = 5, 20,..., 40... problem S V Nvi A F π coins-0 43 3 2 72 3 9 coins-60,08 5 2 35K 3 2 mts-5 625 7 4 4 4 56 mts-35, 5M 573 322 4 4 220K mts-40 2, 5M 684 4 4 304K diag-60-0 29,738 6 8 0 2 9 diag-60-28 > 5M 6 28 2 9 rules-5000 5,000 56 58 50 50 4,97 rules-20000 20,000 592 594 50 50 9,889 An Algorithm Better than? B. Bonet and H. Geffner; 7/05

Empirical Evaluation: Results () 000 00 coins / h = 0 / / B- 000 00 coins / h = h(#vi/2) 000 00 coins / h = h2(#vi/2) / / B- 0 time in seconds 0 0. 0.0 Bounded 0.00 0 0 20 30 40 50 60 70 number of coins 0. 0.0 0.00 0.000 Bounded e-05 0 0 20 30 40 50 60 70 number of coins / B- 0 0. 0.0 Bounded 0.00 0 0 20 30 40 50 60 70 number of coins 000 00 mts / h = 0 000 00 mts / h = h(#vi/2) 000 00 mts / h = h2(#vi/2) time in seconds 0 0. CFC B- 0 0. 0.0 CFC B- 0 0. 0.0 CFC B- 0.0 0.00 Bounded /CFC 0.000 0 5 0 5 20 25 30 35 40 45 size of maze 0.00 0.000 Bounded /CFC e-05 0 5 0 5 20 25 30 35 40 45 size of maze 0.00 0.000 Bounded /CFC e-05 0 5 0 5 20 25 30 35 40 45 size of maze rules systems / max rules = 50, max body = 50 / h = zero rules systems / max rules = 50, max body = 50 / h = h(#vi/2) rules systems / max rules = 50, max body = 50 / h = h2(#vi/2) / / B- time in seconds 00 0 00 0 / B- 00 0 / B- 5000 0000 5000 20000 25000 number of atoms Bounded 5000 0000 5000 20000 25000 number of atoms Bounded 5000 0000 5000 20000 25000 number of atoms Bounded An Algorithm Better than? B. Bonet and H. Geffner; 7/05 2

Empirical Evaluation: Results (2) 00 diagnosis / #tests = 0 / h = 0 diagnosis / #tests = 0 / h = h(#vi/2) 0 diagnosis / #tests = 0 / h = h2(#vi/2) 0 0. time in seconds 0. 0.0 0.00 0. 0.0 B- 0.0 Bounded B- 0.00 0 0 20 30 40 50 60 70 number of states 0.000 / B- Bounded e-05 0 0 20 30 40 50 60 70 number of states 0.00 Bounded 0.000 0 0 20 30 40 50 60 70 number of states 000 diagnosis / #states = 60 / h = 0 000 diagnosis / #states = 60 / h = h(#vi/2) 000 diagnosis / #states = 60 / h = h2(#vi/2) time in seconds 00 0 0. 0.0 0 5 20 25 30 number of tests B- Bounded 00 0 0. 0.0 0.00 0.000 / B- 0 5 20 25 30 number of tests Bounded 00 0 0. 0.0 0.00 B- 0 5 20 25 30 number of tests Bounded Runtimes are roughly bounded ldfs< ldfs lrta*< ao*< vi, except in RULES where lrta* is best. An Algorithm Better than? B. Bonet and H. Geffner; 7/05 3

Conclusions Unified computational framework, that is simple, general, and efficient = Depth First Search + Learning Reduces to state-of-the-art algorithms in some models (OR Graphs and GTs) Yields new competitive algorithms in others (e.g., AND/OR Graphs) Shows that ideas underlying a wide range of algorithms reduce to: Depth First Search Lower Bounds Learning An Algorithm Better than? B. Bonet and H. Geffner; 7/05 4