Planning and search. Lecture 6: Search with non-determinism and partial observability

Planning and search Lecture 6: Search with non-determinism and partial observability Lecture 6: Search with non-determinism and partial observability 1

Today s lecture Non-deterministic actions. AND-OR search trees Search with no observability Search with partial observability On-line search Lecture 6: Search with non-determinism and partial observability 2

Classical search Determinism: each action has a unique outcome: if chose to drive from Arad to Sibiu, the resulting state is Sibiu Observability: can tell which state we are in (are we in Arad?) Known environment: know which states there are, what actions are possible, what their outcomes are (have a map) Lecture 6: Search with non-determinism and partial observability 3

Search with non-deterministic actions Each action has a set of possible outcomes (resulting states) A solution is not a sequence of actions, but a contingency plan, or a strategy: if after pushing the lift button, lift arrives, then take the lift; else take the stairs. Lecture 6: Search with non-determinism and partial observability 4

AND-OR search trees Previous search trees: branching corresponds to the agent s choice of actions Call these OR-nodes Environment s choice of outcome for each action: AND-nodes Lecture 6: Search with non-determinism and partial observability 5

And-or search trees Lecture 6: Search with non-determinism and partial observability 6

And-or search: solution A solution for an AND-OR search problem is a subtree that (1) has a goal node at every leaf (2) specifies one action at each of its OR nodes (3) includes every outcome branch at each of its AND nodes Lecture 6: Search with non-determinism and partial observability 7

finds non-cyclic solution if it exists And-or search: algorithm Lecture 6: Search with non-determinism and partial observability 8

function And-Or-Graph-Search(problem) returns a conditional plan, or failure Or-Search(problem.Initial-State,problem,[]) function Or-Search(state, problem, path) returns a conditional plan, or failure if problem.goal-test(state) then return the empty plan if state is on path then return failure for each action in problem.actions(state) do plan And-Search(Results(state, action), problem,[state path]) if plan failure then return [action plan] return failure function And-Search(states, problem, path) returns a conditional plan, or failure for each s i in states do plan i Or-Search(s i,problem,path) if plan = failure then return failure return [if s 1 then plan 1 else if s 2 then plan 2 else... if s n 1 then plan n 1 else plan n ] Lecture 6: Search with non-determinism and partial observability 9

Cyclic solutions (add a while loop; if in state where the action failed, repeat until succeed) this will work provided that each outcome of non-deterministic action eventually occurs Lecture 6: Search with non-determinism and partial observability 10

Searching with no observations Sensorless or conformant problem: no observation at all The agent knows that it is in one of a set of possible physical states This set of physical states is called a belief state Lecture 6: Search with non-determinism and partial observability 11

Belief-state search space Suppose the underlying physical problem P is defined by States P, Actions P, Result P, Goal-Test P Belief states: all possible subsets of the set of physical states; 2 N if there are N states in P Initial state: typically, the set of all possible physical states (no idea in which state it really is) Actions: assume illegal actions have no effect on the environment. Then Actions(b) = s b Actions P (s) Lecture 6: Search with non-determinism and partial observability 12

Belief-state search space contd. Results: the set of outcomes of actions applied to different physical states. For deterministic actions, b = Result(b,a) = {s : s = Result P (s, a) for s b} For non-deterministic actions, b = Result(b,a) = s b Results P (s, a) Goal test: b satisfies the goal test if all physical states in b do Lecture 6: Search with non-determinism and partial observability 13

Example A B Two physical states: the agent is in room A (s 1 ) or in room B (s 2 ) Actions: Lef t, Right If the agent starts in the belief state b = {s 1, s 2 } and performs action Right, the resulting belief state is b = {s 2 } Lecture 6: Search with non-determinism and partial observability 14

Bigger example Now the agent can also clean dirt, (action Suck), and the states differ also on whether the rooms are clean or dirty. Lecture 6: Search with non-determinism and partial observability 15

L R L R S S L R S R L S S R L L R S R S L Lecture 6: Search with non-determinism and partial observability 16

Problem representation The number of possible belief states is huge Problem representation gets quite complex (working with sets of states) Later in the lectures on planning we will see how first-order representation (describing sets of states by properties which hold in all of them) is much easier to work with and more efficient Lecture 6: Search with non-determinism and partial observability 17

Searching with observations Suppose an agent can sense the environment: can tell whether a room is dirty or not (but only the room where it is, not the next one) Percept(s) returns a percept for the given state, for example [A, Dirty] (if s is the state when the agent is in room A and it is dirty) Transitions are now 3-step: 1) given belief state b we are in and an action a, compute the next belief state (all physical states we may end up in) ˆb=Predict(b,a) 2) for each physical state in Predict(b, a), compute the percept in that state: Possible-Percepts(ˆb)= {o : o = Percept(s) and s ˆb} 3) for each possible percept o, determine the belief state (after executing a and observing o): Update(ˆb, o)= {s : o = Percept(o) and s ˆb} Lecture 6: Search with non-determinism and partial observability 18

Searching with observations contd. Results(b,a)={b o : b o = Update(Predict(b,a),o) and o Possible-Percepts(Predict(b, a))} Can now apply AND-OR search algorithm Lecture 6: Search with non-determinism and partial observability 19

On-line search All search problems considered so far are off-line: solution is found before the agent starts acting On-line search: interleaves search and acting Necessary in unknown environments where the agent does not know what states exist or what its actions do Canonical example: robot placed in an unknown environment of which he must produce a map (exploration problem) Finding a way out of a labirinth... Lecture 6: Search with non-determinism and partial observability 20

What to read for the next lecture First-order logic: chapter 8 in Russell and Norvig, 3rd edition. Lecture 6: Search with non-determinism and partial observability 21