Artificial Intelligence. Non-deterministic state model. Model for non-deterministic problems. Solutions. Blai Bonet

Similar documents
An Algorithm better than AO*?

Completeness of Online Planners for Partially Observable Deterministic Tasks

Learning in Depth-First Search: A Unified Approach to Heuristic Search in Deterministic, Non-Deterministic, Probabilistic, and Game Tree Settings

An Algorithm better than AO*?

Searching in non-deterministic, partially observable and unknown environments

Width and Complexity of Belief Tracking in Non-Deterministic Conformant and Contingent Planning

Real Time Value Iteration and the State-Action Value Function

Chapter 16 Planning Based on Markov Decision Processes

PROBABILISTIC PLANNING WITH RISK-SENSITIVE CRITERION PING HOU. A dissertation submitted to the Graduate School

Generalized Planning: Non-Deterministic Abstractions and Trajectory Constraints

Searching in non-deterministic, partially observable and unknown environments

Internet Monetization

Probabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford

CS 7180: Behavioral Modeling and Decisionmaking

CS360 Homework 12 Solution

SFM-11:CONNECT Summer School, Bertinoro, June 2011

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication

Motivation for introducing probabilities

Markov Decision Processes Chapter 17. Mausam

CSE 573. Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming. Slides adapted from Andrey Kolobov and Mausam

Petri nets. s 1 s 2. s 3 s 4. directed arcs.

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

Chapter 3 Deterministic planning

Planning in Markov Decision Processes

Timo Latvala. March 7, 2004

Strengthening Landmark Heuristics via Hitting Sets

Distributed Optimization. Song Chong EE, KAIST

Reinforcement Learning

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

The simplex algorithm

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation

Accounting for Partial Observability in Stochastic Goal Recognition Design: Messing with the Marauder s Map

Classes and conversions

Grundlagen der Künstlichen Intelligenz

Reinforcement Learning Active Learning

Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic

IS-ZC444: ARTIFICIAL INTELLIGENCE

Automatic Verification of Parameterized Data Structures

Reinforcement Learning and Control

Heuristic Search Algorithms

The State Explosion Problem

Value Iteration. 1 Introduction. Krishnendu Chatterjee 1 and Thomas A. Henzinger 1,2

CS 188: Artificial Intelligence

An Adaptive Clustering Method for Model-free Reinforcement Learning

Heuristics for Cost-Optimal Classical Planning Based on Linear Programming

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Temporal logics and explicit-state model checking. Pierre Wolper Université de Liège

Module 8 Linear Programming. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Causal Belief Decomposition for Planning with Sensing: Completeness Results and Practical Approximation

Markov Decision Processes Chapter 17. Mausam

CS 234 Midterm - Winter

Probabilistic Planning. George Konidaris

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs

A Gentle Introduction to Reinforcement Learning

Transition Predicate Abstraction and Fair Termination

Reinforcement Learning II

Reinforcement Learning. Introduction

A subexponential lower bound for the Random Facet algorithm for Parity Games

2 GRAPH AND NETWORK OPTIMIZATION. E. Amaldi Introduction to Operations Research Politecnico Milano 1

arxiv: v3 [cs.ds] 14 Sep 2014

CS 4100 // artificial intelligence. Recap/midterm review!

(Deep) Reinforcement Learning

Chapter 4: Dynamic Programming

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro

Tute 10. Liam O'Connor. May 23, 2017

Reinforcement Learning: An Introduction

CS 6783 (Applied Algorithms) Lecture 3

Lecture 4. Finite Automata and Safe State Machines (SSM) Daniel Kästner AbsInt GmbH 2012

Algorithms and Theory of Computation. Lecture 11: Network Flow

arxiv: v1 [cs.lo] 19 Mar 2019

Introduction to Reinforcement Learning

Time(d) Petri Net. Serge Haddad. Petri Nets 2016, June 20th LSV ENS Cachan, Université Paris-Saclay & CNRS & INRIA

Principles of AI Planning

Task Decomposition on Abstract States, for Planning under Nondeterminism

Automated Circular Assume-Guarantee Reasoning with N-way Decomposition and Alphabet Refinement

Bubble Clustering: Set Covering via Union of Ellipses

Allocate-On-Use Space Complexity of Shared-Memory Algorithms

Time and Timed Petri Nets

Q-Learning in Continuous State Action Spaces

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Pattern Database Heuristics for Fully Observable Nondeterministic Planning

Register machines L2 18

Probabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Markov decision processes (MDP) CS 416 Artificial Intelligence. Iterative solution of Bellman equations. Building an optimal policy.

Comparison of Information Theory Based and Standard Methods for Exploration in Reinforcement Learning

Notes from Week 9: Multi-Armed Bandit Problems II. 1 Information-theoretic lower bounds for multiarmed

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication 1

Point-Based Value Iteration for Constrained POMDPs

Least squares policy iteration (LSPI)

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Algorithms for MDPs and Their Convergence

New rubric: AI in the news

Multiagent Planning in the Presence of Multiple Goals

Automata-based Verification - III

Join Ordering. 3. Join Ordering

Using first-order logic, formalize the following knowledge:

Best-Case and Worst-Case Behavior of Greedy Best-First Search

Efficient Maximization in Solving POMDPs

Transcription:

Artificial Intelligence Blai Bonet Non-deterministic state model Universidad Simón Boĺıvar, Caracas, Venezuela Model for non-deterministic problems Solutions State models with non-deterministic actions correspond to tuples (S, A, s init, S G, F, c) where S is finite set of states A is finite set of actions where A(s), for s S, is the set of applicable actions at state s s init S is initial state S G S is set of goal states F : S A 2 S \ is non-deterministic transition function. For s S and a A(s), F (s, a) is set of possible successors Solutions cannot be linear sequences of actions that map s init into a goal state because actions are non-deterministic Solutions are strategies or policies that map s init into a goal state Policies are functions π : S A that map states s into actions π(s) to apply at s (implied constraint: π(s) A(s)) Unlike acyclic solutions for AND/OR graphs, solutions may be cyclic c : S A R 0 are non-negative action costs

Example Executions Given state s, a finite execution starting at s is interleaved sequence of states and actions s 0, a 0, s 1, a 1,..., a n 1, s n such that: s 0 = s s i+1 F (s i, a i ) for i = 0, 1,..., n 1 s i / S G for i < n (i.e. once goal is reached, execution ends) Maximal execution starting at s is (possibly infinite) sequence τ = s 0, a 0, s 1, a 1,... such that: Goal: reach lower right corner (n 1, 0) Actions: Right (R), Down (D), Right-Down (RD) Dynamics: R and D are deterministic, RD is non-deterministic. If RD is applied at (x, y), the possible effects are if τ is finite, it is a finite execution ending in a goal state if τ is infinite, each finite prefix of τ is a finite execution Execution (s 0, a 0, s 1,...) is execution for π iff a i = π(s i ) for i 0 (x + 1 mod n, y), (x + 1 mod n, y 1 mod n), (x, y 1 mod n) Examples of executions Examples of executions Policy π 1 given by Policy π 2 given by π 1 (x, y) = Right if x < n 1 Down if x = n 1 π 2 (x, y) = Right-Down

Examples of executions Examples of executions Policy π 2 given by Policy π 2 given by π 2 (x, y) = Right-Down π 2 (x, y) = Right-Down Fairness Solution concepts Let τ = s 0, a 0, s 1,... be a maximal execution starting at s 0 = s We say that τ is fair execution if either τ is finite, or if the pair (s, a) appears infinitely often in τ, then (s, a, s ) also appears infinitely often in τ for every s F (s, a) Alternatively, τ is unfair execution iff τ is infinite, and there are s, a and s F (s, a) such that (s, a) appears i.o. in τ but (s, a, s ) only appears a finite number of times Let P = (S, A, s init, S G, F, c) be non-det. model and π : S A π is strong solution for state s iff all maximal executions for π starting at s are finite (and thus, by definition, end in a goal state) π is strong cyclic solution for state s iff every maximal execution for π starting at s that does not end in a goal state is unfair The following can be shown: if π is strong solution for state s, there is bound M such that all maximal executions have length M if π is strong cyclic solution for state s but not strong solution, the length of the executions ending at goal states is unbounded

Solutions for example Assigning costs to policies If π is strong solution for s init, all maximal executions are bounded in length and a cost can be assigned If π is strong cyclic solution for s init but not strong solution, it is not clear how to assign a cost to π However, we can consider best and worst costs for π at s: π 1 (x, y) given by Right when x < n 1 and Down when x = n 1 is strong solution π 2 (x, y) = Right-Down is strong cyclic solution π 3 (x, y) = Down is not solution V π 0 best (s) = c(s, π(s)) + minvbest π (s ) : s F (s, π(s)) V π worst(s) = if s SG if s / S G 0 if s SG c(s, π(s)) + maxvworst(s π ) : s F (s, π(s)) if s / S G Characterizing solution concepts AND/OR formulation for non-deterministic models Characterization of solutions using best/worst costs for π at s: π is strong solution for s init iff V π worst(s init ) < π is strong cyclic solution for s init iff V π s that is reachable from s init using π best (s) < for every state Additionally, if we define V max (s) at each state s by 0 if s S G V max (s) = c(s, a) + maxv max (s ) : s F (s, a) if s / S G min a A(s) Then, there is strong policy for s init iff V max (s init ) < Given non-deterministic model P = (S, A, s init, S G, F, c), we define two different AND/OR models: (Finite) AND/OR model over states: OR vertices for states s S and AND vertices for pairs s, a for s S and a A(s). Connectors (s, s, a ) with cost c(s, a) for a A(s), and connectors ( s, a, F (s, a)) with zero cost (Infinite but acyclic) AND/OR model over executions: OR vertices are finite executions starting at s init and ending in states, and AND vertices are finite executions starting at s init and ending in actions. Connectors ( s init,..., s, s init,..., s, a ) with cost c(s, a) for a A(s), and connectors ( s init,..., s, a, s init,..., s, a, s : s F (s, a)) with zero cost

Finding strong solutions Finding strong cyclic solutions If there is a strong solution (i.e. if V max (s init ) < ), AO* finds one such solution in the infinite and acyclic AND/OR model Conversely, if AO* solves the infinite and acyclic AND/OR model, the solution found by AO* is a strong solution Another method is to select actions greedily with respect to V max Indeed, the policy π defined by π(s) = argmin c(s, a) + maxv max (s ) : s F (s, a) a A(s) is a strong solution starting at s init when V max (s init ) < These solutions are more complex than strong solutions (and so, the algorithms are not straightforward) Let s begin by definining V min in a similar way to V max : 0 if s S G V min (s) = c(s, a) + minv min (s ) : s F (s, a) if s / S G min a A(s) For every policy π and state s, we have 0 V min (s) V π best (s) V max(s) V π worst(s) Finding strong cyclic solutions Consider non-deterministic model (S, A, s init, S G, F, c) Algorithm for finding a strong cyclic solution for s init : 1. Start with S = S and A = A 2. Find V min for states in S and actions in A by solving the equations for V min 3. Remove from S all states s such that V min(s) =, and remove from A (s), for all states s, the actions leading to removed states 4. Iterate steps 2 3 until reaching a fix point (# iterations bounded by S ) 5. If state s init is removed, then there is no strong cyclic solution for s init 6. Else, the greedy policy with respect to last V min, given by π(s) = argmin a A (s) c(s, a) + minv min(s ) : s F (s, a), is strong cyclic solution for s init Solving equations for V min (and V max ) Idea is to use the equation defining V min as assignment inside an iterative algorithm that updates initial iterate until convergence Algorithm for finding V min over states in S and actions in A : 1. Start with initial iterate V (s) = for every s S 2. Update V at each state s S as: 0 if s S G V (s) := c(s, a) + minv (s ) : s F (s, a) if s / S G min a A (s) 3. Repeat step 2 until reaching a fix point (termination is guaranteed!) At termination, the last iterate V satisfies V = V min For V max replace inner min by max, and start with V (s) = 0 for all s. If V max (s) < for all s, number of iterations polynomially bounded

Summary Non-deterministic state model Non-determinism: angelical (due to env./world), adversarial, or mix Solution concepts: strong and strong cyclic policies Algorithms for computing solutions